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Foreword 


The International Summer School on Statistical Distributions in Scientific Work 
was held in Trieste during July 1980 for a period of three weeks. The emphasis was on 
research, review, and exposition concerned with the interface between modern 
statistical distribution theory and real world problems and issues involving science, 
technology, and management. Both theory and applications received full attention at 
the School. The program consisted of a Short Intensive Preparation Course, a NATO 
Advanced Study Institute, and a Research Conference. While the relative composi- 
tion of these activities varied somewhat in terms of instruction, exposition, research- 
review, research, and consultation, the basic spirit of each was essentially the same. 
Every participant was both a professor and a student. 

The summer school was sponsored by the NATO Advanced Study Institutes 
Program; Consiglio Nazionale delle Ricerche, Italy; Regione Autonoma Friuli Ven- 
ezia Giulia, Italy; National Institutes of Health, USA; Office of Naval Research, 
USA; The Pennsylvania State University; Universita di Roma; Universita di Trieste; 
International Statistical Ecology Program; International Transfer of Science and 
Technology, Belgium; and the participants and their home institutions and organiza- 
tions. 

Research papers, research-review expositions and instructional lectures were spe- 
cially prepared for the program. These materials have been refereed and revised, and 
are now available in a series of several edited volumes and monographs. 


BACKGROUND 


It is now close to two decades since the International Symposium on Classical and 
Contagious Distributions was held in Montreal in 1963. It was the first attempt to 
identify the area of discrete distributions as a subject area by itself. The symposium 
was a great success in that it stimulated growth in the field and more importantly 
provided a certain direction to it. Next came the Biometric Society Symposium on 
Random Counts in Scientific Work at the annual meetings of the American Associa- 
tion for the Advancement of Science held in 1968. The first symposium had em- 
phasized models and structures, the second one focused its attention on the useful role 
of discrete distributions in applied work. 

Seven years ago, a Modern Course on Statistical Distributions in Scientific Work 
was held at the University of Calgary in 1974 under sponsorship of the NATO 
Scientific Affairs Division. The Program consisted of an Advanced Study Institute 
(ASI) followed by a Research Conference on Characterizations of Statistical Distribu- 
tions. The purpose of the ASI was to provide an open forum with focus on different 
aspects of statistical distributions arising in scientific or statistical work. The purpose 
of the characterizations conference was to bring together research workers investigat- 
ing characterization problems that have motivation in scientific concepts and formula- 
tions or that have application or potential use for statistical theory. The program was a 
great success. Participants still remember it very fondly for its scientific impact and its 
social and professional contact. 


xi 
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CALGARY PROGRAM 


The edited Proceedings of the Calgary Program consist of three substantive 
volumes. They have been acknowledged to include a wealth of material ranging over 
a broad spectrum of the theory and applications of distributions and families of 
distributions. Most papers have been acknowledged for their content by reviewers in 
professional journals. The reviews have on the whole stressed the importance of these 
Proceedings as a successful effort to unify the field and to focus on main achievements 
in the area. Moreover, many of the papers which appeared in the Proceedings have 
been, and continue to be, quoted extensively in recent research publications. The 
Calgary Program of 1974 has had a definite and positive impact on stimulating further 
developments in the field of statistical distributions and their applications. 

At the same time, essentially for economic reasons, the sciences, technology, and 
society are recognizing ever-expanding needs for quantification. The random quan- 
tities arising in conceptualization and modeling, in simulation, in data analysis, and in 
decision-making lead increasingly to various kinds of distributional problems and 
requests for solution. Statistical distributions remain an important and focal area of 
study. It is no surprise that the subject area of statistical distributions in scientific work 
is still advancing steadily. 

Interestingly, the Calgary participants perceived this future need and concern. In 
anticipation, several prominent participants formed a Committee on Statistical Dis- 
tributions in Scientific Work to discuss future plans and activities that would help 
consolidate and strengthen the subject area of statistical distributions and its applica- 
tions on a continuing basis. The Committee identified the following needs and 
activities: (i) Preparation of a Comprehensive Dictionary and Bibliography of Statis- 
tical Distributions in Scientific Work, (ii) Preparation of Monographs and Modules 
on Important Distributions, Concepts, and Methods with Applications, and (iii) 
Planning and Organization of a Sequel to the Calgary Frogram. 


DISTRIBUTIONAL ACTIVITIES 


A well sustained seven year effort has produced a comprehensive three-volume set 
entitled A Modern Dictionary and Bibliography of Statistical Distributions in Scien- 
tific Work. The three volumes are: Volume 1, Discrete Models; Volume 2, Continu- 
ous Univariate Models; and Volume 3, Multivariate Models. The Dictionary covers 
several hundred distributional models and gives wherever possible their genesis, 
structural properties and parameters, random number generations, tabulations, 
graphs, and inter-relations through verbal statements as well as schematic diagrams. 
The Bibliography covers over ten thousand publications. Besides the usual reference 
information, each entry provides users listing (citation index), reviews, classification 
by distribution, inference and application, plus any special notes. The massive effort 
by the dictionary bibliography team consisting of M. T. Boswell, S. W. Joshi, G. P. 
Patil, M. V. Ratnaparkhi, and J. J. J. Roux needs to be specially acknowledged. So 
also the continuing interest and response of the professional community. It is hoped 
that the dictionary and bibliography effort will be a continuing activity serving the 
community with updated information from time to time. 
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On the monographs front, a lucid volume by J. B. Douglas, entitled Analysis with 
Standard Contagious Distributions, has been published. It should be of value to all 
those who are working with contagious distributions in one context or the other. More 
monographs are under preparation as follows: 

Aitchison, J.: Distributions on the Simplex of Their Applications 

Arnold, B. C.: Pareto Distributions and Applications 

Cobb, L.: Catastrophe Theory and Distributional Problems 

Folks, J. L. and Chhikara, R. S.: Inverse Gaussian Distribution and Applications 

Mosimann, J. E.: Analysis Using Size and Shape Variables 

Ord, J. K. and Patil, G. P.: Introduction to Probabilty and Statistical Modeling 

Regarding the planning and organization of a sequel to the Calgary Program, the 
NATO Advanced Study Institutes Program encouraged part of the Committee to meet 
and assisted the Committee to have indepth discussions at Parma, Italy, in 1978. The 
following members were in attendance: B. A. Baldessari, T. Cacoullos, S. Engen, 
S. Kotz, J. E. Mosimann, J. K. Ord, G. P. Patil, C. Taillie, J. Tiago de Oliveira, 
W. G. Warren, and M. E. Wise. The intensive and open deliberations proved to be 
very constructive. The Committee felt unanimously that a follow-up to the Calgary 
ASI was very much needed, and that it should be held in 1980. Several institutions 
offered to host such an ASI. It was decided that the program be held in Italy. Bruno 
Baldessari and Livia Rondini assured the necessary support in this connection. 


TRIESTE PROGRAM 


A major purpose of the program was to give a unified and integrated view of 
different classes of distributions and to describe novel methodologies related to 
statistical distributions and/or their applications. Also, contributions on the descrip- 
tion and characterization of distributions which are useful in a variety of fields of 
application were welcomed. 

An application was prepared for the NATO ASI Program with G. P. Patil as the 
Chairman of the Organizing Committee, with B. Baldessari as the Director and 
C. Taillie as the Co-Director, with S. Kotz, J. E. Mosimann, J. K. Ord, and G. P. 
Patil as the Scientific Directors, and with L. Rondini as the Host. The NATO ASI 
program provided a positive response. Requests for the additional support needed 
were granted from within Italy and the USA. Participants and their institutions also 
extended a helping hand. 

Spread over the three week period, the School had over 140 scientific participants 
and 50 accompanying persons from various countries around the world. The scientific 
program was more than full, and yet the overall program had a relaxing touch. 
Everything that the hosts, L. Rondini, A. Kostoris, S. Orviati, M. Strassoldo, M. 
Umani, and E. Feoli, did has been simply sweet and gratifying. 

The Trieste program was a great success. Many have wondered as to when it would 
be again that they would meet and participate in another timely activity on statistical 
distributions in scientific work. If you have any thoughts or suggestions, please do not 
hesitate to let us know. I look forward to hearing from you. 


April 30, 1981 G. P. Patil 


Program Acknowledgments 


For any program to be successful, mutual understanding and support among all 
participants are essential in directions ranging from critical to constructive and from 
cautious to constructive. The present program is grateful to the members of the 
Committee, and to the referees, advisors, sponsors and the participants for their 
timely advice and support. 

Trieste is a beautiful place and so is the surrounding region. The Mediterranean 
around, the mountains nearby, and the campus on the top of a mountain provide a very 
scenic mosaic conducive for scholarship and communication. Italy has had a long 
tradition of research on distributional problems and related issues arising from 
uncertainty. It was only natural that the International Summer School on Statistical 
Distributions in Scientific Work met at Trieste. 
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Preface 


These three volumes constitute the edited Proceedings of the NATO Advanced 
Study Institute on Statistical Distribution Theory and its Applications held at the 
University of Trieste from July 10-August 1, 1980. The general title of the volume is 
Statistical Distributions in Scientific Work, a continuation from the Proceedings of an 
earlier program held at the University of Calgary during the summer of 1974, which 
brought out volumes 1, 2, and 3. The present volumes are: Volume 4 — Models, 
Structures, and Characterizations; Volume 5 — Inferential Problems and Properties; 
and Volume 6 — Applications in Physical, Social, and Life Sciences. These are 
based on the research-review expositions, instructional lectures, and research papers 
specially prepared for the program by the invited researchers and expositors. 

The planned activities of the Institute consisted of lucid perceptive lectures and 
expositions, seminar lectures, study group discussions, tutorials, and individual 
study. The activities included meetings of editorial committees to discuss editorial 
matters for these proceedings which consist of the contributions that have gone 
through the usual refereeing process. The overall perspective of the program is 
provided by the Chairman of the Organizing Committee, Professor G. P. Patil, in his 
Foreword to the Volumes as summarized from his inaugural address to the Institute. 

The Proceedings are being published in three volumes. All together, they consist of 
15 topical sections of 100 contributions of 1260 pages of research, review, and 
exposition. Subject and author indexes also appear at the end of each volume. Effort 
has been made to keep the title and the content of each volume mutually consistent. 
However, it is quite possible that a different composition would have looked equally 
natural! 

We view this program as a continuation of the tradition established by the pioneer- 
ing 1963 Montreal Symposium which identified and consolidated statistical distribu- 
tions as a separate field of statistical inquiry. The tradition was further carried on and 
amplified by the 1974 Calgary program. It was reassuring to see several participants 
at Trieste that were present at Montreal and/or Calgary. A number of new and young 
faces were also visible at Trieste. The papers in these Proceedings should reflect the 
recent and current developments and mirror the growth and maturity of the discipline 
and its integration within the general framework of applied statistics and related 
quantitative studies. 

While working in the field of statistical distributions in general, it is often tempting 
to tackle isolated problems involving formal generalizations. One at times loses sight 
of the underlying probabilistic model even in this process. While this generalization 
approach may be quite acceptable from the mathematical point of view, it does 
however result, on occasion, in statistically unjustified theoretical exercises. There 
has been some justified criticism voiced by practitioners that we are losing touch with 
reality. A purpose of the Trieste program was to help generate a constructive dialogue 
between theory and application. 

The program covered a broad spectrum of topics. Models and structures theme 
touched base with continuous models, discrete models, properties, computer genera- 
tion, and characterizations. Inferential problems and properties included distribu- 
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tional testing and goodness-of-fit, parameter estimation, hypothesis testing , approx- 
imations, reliability and life testing. Real world problems were drawn from the 
physical sciences, social sciences and life sciences, and also included work on 
extreme values and order statistics. Thus, the formal and informal dialogues provided 
a panorama of the distributional field both in theory and in application. These 
published volumes constitute an effort to share those Proceedings with the interested 
reader. The spark and the spontaneity of a lively dialogue do not necessarily transmit 
themselves through written proceedings. We hope and trust, however, that the reader 
will instead reap the benefit from the careful preparation and editing through which 
each paper has gone. 

In any collaborative effort of this magnitude and nature, the enthusiastic support of 
a large number of individuals and institutions is a prerequisite for success. We are 
extremely grateful to all of our sponsors, participants, and the hosts. Also to our 
ever-cheerful program secretary, Barbara Alles, who has managed to keep the 
program moving in every sense of the word. 

These three volumes have been included in the ongoing NATO Advanced Study 
Institutes Series. They are published by the D. Reidel Publishing Company, a 
member of the Board of Publishers of the NATO ASI Series. It is only proper that we 
conclude here with our sincere thanks to both the Publisher and the NATO Scientific 
Affairs Division for these co-operative arrangements. 


April 30, 1981 Charles Taillie 
Ganapati P. Patil 
Bruno A. Baldessari 
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A REVIEW OF DISTRIBUTIONAL TESTING PROCEDURES 
AND DEVELOPMENT OF A CENSORED SAMPLE 
DISTRIBUTIONAL TEST 


SAMUEL S. SHAPIRO and 
CARLOS W. BRAIN 


Department of Mathematical Sciences 
Florida International University 
Miami, Florida 33199 USA 


SUMMARY. A review of recent procedures for tests of distributional 
assumptions is given. Test procedures are grouped into three 
categories: regression type tests, probability integral trans- 
formation tests and special feature tests. The application of 
regression tests to distributions with location and shape para- 
meters with emphasis on the normal, multivariate normal, expon- 
ential, and Weibull distributions and the advantages and limi- 
tations of recently improved EDF procedures are discussed. Tests 
which use characteristics unique to each of the normal, expon- 
ential, gamma, extreme value and Weibull distributions are also 
discussed. 

New regression test procedures for censored samples are dis- 
cussed in general and then applied to the exponential and normal 
distributions. 


KEY WORDS. distributional tests, W tests, EDF tests, censored 
samples. 


1. INTRODUCTION 


The subject of testing for distributional assumptions dates 
back to the work of K. Pearson (1900) in which he devised the 
chi-square goodness of fit test. Statisticians have been inter- 
ested in this subject because inferences based on an assumed 
statistical model can be quite poor if the assumption is incorrect. 
In the last ten years there has been a number of new procedures 
proposed. The first part of this paper represents an attempt 
to group these procedures into general categories and present a 
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review of the latest developments. The review is limited to 
recent work and no attempt has been made to include every pro- 
cedure. 


The second part of the paper develops a rationale for a test 
procedure which can be used with censored samples. This idea is 
then applied to develop test statistics for the exponential and 
normal distributions. 


2. REVIEW 


Tests for distributional procedures have been categorized 
into three groupings: regression type tests, probability trans- 
formation tests and special feature tests. The latter category 
contains procedures which use a special characteristic of the null 
distribution as the test criterion, e.g., the standardized third 
moment, Yb1> test for the normal distribution makes use of the 


fact that 7B, = 0. The advantage of the tests in the first two 


categories is that they can be generalized for many null distri- 
butions while the special feature tests cannot. In passing it 
should be noted that the chi-square goodness of fit test has been 
omitted. Many practitioners have found its power properties 
lacking or dislike the necessity of discretizing the data. 


2.1 Regresston Tests. The best known of the regression procedures 
is a subjective, graphical procedure known as probability plotting. 
This procedure is available for many distributions and can be.also 
used with censored samples. A probability plot can be considered 
as the graphical representation of the regression of the ordered 
observations on the expected values of the order statistics from 

a standardized distribution (distribution with unit scale para- 
meter and zero location parameter). Thus, 


Yil/n 


=uto Ts In +e, (1) 


m4 
represents such a model, where 


Yiln is the ith ordered observation in a sample of size n, 


is the corresponding expected value of the ith order 


statistic from a standardized distribution 
LH is the location parameter, 
Oo is the scale parameter, and 


ey is the measurement error. 
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(Subsequently the subscript n above will be suppressed and in 
the following y, vepresents the ith ordered observation.) In 


practice one does not need to know the m,'s. Blom (1958) showed 


that m, = F(I,) where 


I, = (i-a,)/ (a-o., -B8, +1) 


and Fa" is the inverse of the standardized distribution func- 
tion. Thus, a plot of 5 vs m, can be carried out by using 


specially scaled paper and plotting on vs TL, 3 the scaling 


converts I, to Ft). The values of a, and Bs vary 
from distribution to distribution; however, a commonly used com- 


promise is to set Oo, = Bs = 1/2. The rationale behind the tech- 
nique is that if the y,'8 are a random sample from the null 
distribution represented by the m,'s then a straight line plot, 


up to random fluctuations, should result. Departures from 
linearity are indicative of lack of fit. The procedure of course 
is subjective and there is no type I error associated with the 
procedure. 


Detailed descriptions of how to make and use probability 
plots can be found in Shapiro (1980) and Hahn and Shapiro (1967). 


In order to make this procedure more objective, Shapiro and 
Wilk (1965) suggested testing for linearity by considering b, 
the slope of the regression line, which provides an estimate of 
o in (1). This estimate of 0, when squared, is then compared 


to s?, the total sum of squares about the mean, which is an 


; 2 : P ; 
estimate of (n-l)o much the same way as is done in an analysis 
of variance procedure. If the model is correct, hence, the 
straight line is the proper model, then b2 provides an estimate 


of 0; however, if the model is inappropriate then ie will not 
be an appropriate estimator. The estimator s2 always provides 


an estimate of (a-1)05% This ratio is independent of location and 
scale parameters and thus does not require knowing their 


values. 


Using this rationale Shapiro and Wilk developed tests for the 
normal (1965) and the two-parameter exponential (1972) distribu- 
tions. The test statistics were respectively 

[n/2] ie ay 
5 ig Le cg ae OR 1) (y4-y) (2) 
i=1 i=l 
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n(y-y,)° 


and We iS 


e 2 
(a-1)°] ty i=9) 
i=1 


where [n/2] is the greatest integer symbol and the a,'s are 


constants derived from the expected values and covariance matrix 
of the normal order statistics and are required for linear esti- 
mation of o based on the slope of the regression line. The 


authors provided the a,'s and percentiles of Wy for samples 


up to size 50 and the percentiles of We up to sample size 


100. Shapiro and Francia (1972) extended the normality test up 
to sample size 100 by replacing the a,'s with coefficients 


that depend only on the expected values of the order statistics. 
The modified procedure 


: [n/2] 5) =D 
Wi ssf ake b, (rs iyg pr typ dda/ Bag syd 


can be extended to even higher sample sizes since the b "s” are 
simple functions of the m,'s. Shapiro (1980) provided a table 


of b values and percentiles for n up to 99. 


Stephens (1978) provided a modified form of the We pro- 


cedure for the case where the location parameter is known. In 
this procedure the known location parameter, uw (the minimum 
possible value) is included as an observation. This increases 


the sample size to n+l and the We statistic is computed with 


U replacing yy in the formula and n+l replacing n. The 


same table of percentiles is used except that sample size is 
considered to be n+l. 


Sarkadi (1975) proved that the Wy and Wy are asymptoti- 


cally equivalent and that both are consistent. Weisberg (1974) 


showed that the percentage points for W. and Wy were similar 


and Wy values could be used to estimate the Wy values for 
a < 50. 


The power of the W _ procedure has been studied by many 
authors who have generally used it as the standard to which a new 
procedure is compared. In most of these cases the power of the 


W procedure ranks high for wide ranges of differing shaped distri- 
butions. 
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Another approach to assessing the linearity of (1) is to 
fit higher order polynomials to the model and test for lack of 
fit. This was first suggested by Shapiro (1964) and developed 
by LaBrecque (1977). In this latter paper three alternative 
models were used. The first was 


yi = uto m, +a >, (m,) + 8B $. (m;) + e, 


and the second and third set a= 0 and then 8 = 0, respec- 
tively. The >, (m;) are orthogonal polynomials of order j and 


a test of the hypothesis a= 0 and/or 8B = 0 (depending on the 
model) provides a technique for testing for goodness of fit. 
LaBrecque suggested in testing for normality use of the quadratic 
function against skewed alternatives and the cubic function 
against symmetric alternatives. The three tests suggested for 
normality were 


f tw (Qo op 80/26 Fie a2/ee, . Pemooe- 
at 2 3 
n/2 
2 a, 74~ Yy-i41) for n even 
n ah 
where a= 
[n/2] } 
a ays Yo eo va for n odd 
i=l 
2 2 
sore htL2] ‘ = 
ie oy by oe z Seer X(y,-y) /(n-1) 


and the a,'s and b,'s for n < 64 were provided in the paper. 


LaBrecque provided a table of percentiles for n<12 anda 
simple function to compute the percentiles for n> 12. 


Another version of the regression tests was suggested by 


Filliben (1975). In this procedure the probability plot was con- 
sidered as a regression of the ordered observations on the medians 
of the order statistics (m;) and the author used the coefficient 


of correlation between the y's and the (m')'s as the test 


criterion. Thus, 


Z(y,-y) (mj-m') 
r= 


VECy 4H)” E(mt-m")? 
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is used as the test statistic. The rationale here is that a 
equation (1) holds, i.e., a straight line is the appropriate 
fit, then r will be close to one. One should note that r 
will always be positive and relatively high since the y,'s and 


the (m;)"s are both ordered. The distribution of r depends 


only on the sample size and the null distribution and Filliben 
provided the percentiles of r up to n = 100 for the normal 
distribution. The author also recommended obtaining the medians 
as follows. The normal order statistic medians, (m;)"s, are 


related to the uniform order statistic medians, B,'S> through 
the inverse probability integral transformation. An approximation 
to the uniform order statistic medians is , 

a = 1 - 8, = (.5)2/2 ant: 8 8 + (i-. 3175) / (m+. 365) 

i = 2,3,°°:,n-1 and my = F (g.)- 


The calculation of r can be simplified for the normal 


distribution since m' = 0 and m * = —m; 3 thus 


nh 
' 
le 
1 
r = 


n n 
= 72 a= 2 


ni 
where yy < Yo SSK oe 


The regression procedures are quite general and the test 
rationale can be used for any location-scale parameter family or 
one that can be transformed to such a family. However, since the 
coefficients and percentiles depend on the null hypothesis these 
must be generated for each case. For example, Shapiro (1964) 
derived the W test for the uniform distribution as 


= 2 = 2 
Wait ie adie 2 Yim 


and Brain and Shapiro (1980) developed a statistic for testing 
for the Weibull distribution. This latter procedure makes use 
of the fact that the function x, = In ve transforms a Weibull 


variate to a type 1 extreme value (smallest value) distribution 


which is a location-scale parameter family. The test statistic 
is 


2 24-2 =. 2 
W, = [.6079 L, - .2570 L,]°/n°2 (x,-x) 
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n n 
h = = = /j) 
where x, 1n Ys L) ) aX; > Ly } bX, 
i=1 i=1 


The numerator of this statistic uses a linear estimator of the 
scale parameter (for extreme value distribution) suggested by 
D'Agostino (1971). Brain and Shapiro (1980) provide the necessary 
constants and percentiles for sample sizes up to 50. . 


A test of multivariate normality using the Wy procedure 


was developed by Malkovich and Afifi (1973) using S. N. Roy's 
(1957) "union-intersection" principle of test construction. The 
test is constructed as follows. If Y is a multivariate normal 


random variable of dimension px1 then C'Y is univariate 
normal for all constant vectors C, C # 0. A multivariate normal 
sample Ypoer oY, can then be reduced to a univariate normal 


sample C'Y “9,C'Y and the order statistics of this univar- 


5 Pg 
iate sample can be used to evaluate the Wy statistic given in 


(2). Thus, if one can find the vector C* that minimizes Wy 


and use the corresponding values of C*'Y in Wy then a test for 


normality is given by 


* = < 
Wy CC ) Aa Wy fC) K, 


~ 


where Kk, is a constant. A solution for C* does not exist but 


an approximation is given by the authors. The test is carried out 
as follows: 


ho ibetsY. be the observation vector for which 
~m 


= a = = -1 — 
(Y.-H AY) = max (Y,-¥) 1A,“ 
“Mooliibba nl ~~ 1l<j<n 
n J - . 
where A = } @-¥) (y,-¥)" and Y is the sample mean 
qijor-toa’ sistas 2 


vector. 
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2. Order the statistics 


= {PSU ened = 1,2,¢¢+,n 
u oes Y) A YY, Y) ’ j 99 ’ 


(5) 


and denote them by Upattys ott ol 


35 The test statistic is 
[n/2] 2; 
Ae as PTE Sai Bs hier 
We Se 
= =I = 
_— ! — 
Otters Waites 


where the a,'s are given in Shapiro and Wilk (1965). 


In summary, the regression tests are quite general, can be 
generated for most location-scale parameter families for which 
linear estimation of the scale parameter can be accomplished, 
are scale and location invariant and in the studies to date 
Shapiro, Wilk and Chen (1968), LaBrecque (1977) and Filliben 
(1975) have been shown to have good power against a wide range 
of alternatives relative to other procedures. The power differ- 
entials among the various regression procedures are small and 
vary with sample size and alternative distribution. Choice of 
any particular one could be dependent on ease of computation or 
availability of a computer program. 


2.2 Probability Integral Transformation Tests. It is a well- 
known fact that the distribution function is uniformly distributed. 
The transformation 


Nan 
D. = f . £(t)dt 


1 
00 


converts the random variable “ with density f(y) to P, 


which has a uniform density. Thus, a test for a distributional 
assumption can be carried out by transforming the observations 


ve to Ps and testing p,'s for uniformity. The transformation 


can be performed if the parameters of f(y) are known. A series 
of tests have been developed using this rationale. The best 
known of these are the Kolmogorov-Smirnov, Cramér-von Mises, 
Anderson-Darling and Watson procedures. In addition, there have 
been a number of suggestions made in an attempt to improve the 
power of these procedures, for example, Durbin (1961) suggested 

a series of transformations which in the non-null case would 

tend to make the transformed variates more non-uniform. The 
major advantage of these procedures, often called Empirical 
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Distribution Function or EDF procedures, is that if the parameters 
are known the distribution of the test statistic does not depend 
on the null distribution and unlike the regression procedures, 

one set of percentiles work for all null distributions. The 
major disadvantages of these procedures are that when the para- 
meters are not known the null distribution varies with the null 
hypothesis and even when the parameters are known the power of 

the procedures is relatively low compared to the regression test. 


A major development of EDF tests, which greatly increased 
their usefulness was due to Stephens (1974, 1976) and Lilliefors 
(1967). These modifications changed the simple hypothesis tests 
into ones valid for composite hypotheses and at the same time 
increased their power. Estimators of the parameters could now 
be used. However, this meant that percentiles of the test statis- 
tic are now dependent on the null distribution. 


Stephens (1976) used the following approach to develop com- 
posite hypothesis procedures for the normal and exponential (one 
parameter) distribution. The parameters for the normal are 
estimated by y and s and the parameter for the exponential by 
l/y. The observations are ordered such that yy < Yo << Ors Vas 


The y,'s are standardized by the transformation 


v.70, -D/s 


for the normal case, and 
a ¥4ly 


for the exponential case. Next the probability integral transfor- 
mation is made, i.e., 


z= F(w,) 
which for the exponential distribution is simply 
a=) 
z, =rl.-e cs 
bi 


The z,'s are used for computing the test statistics. Percentiles 
i 
were given by Stephens (1979) for the following procedures. 
n 
° “ 2 
Cramer-von Mises Test: we = ) [z,-(2i-1)/2n] + 1/12n. 
i=l 


- 2 
Watson's Test: ve = we -n(z - 1/2)". 
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Ble 


n 
: 2 . 4 
Anderson-Darling Test: A =- ors In[z, (1-2, 44.) ]} n. 


The test statistics for the normal distribution are computed as 
follows 


2% 


= W2(1-+.0.5/n). 9 U2 U-(it.025/n), 


= 
! 


Ce 
il 


2 
(1 + 0.75/n + 2.25/n’) 
and for the exponential distribution 


* 
= wa + 0.16/n), v2 = v-(1 + 0.16/n), 


= 
I 


> 
I 


= A’ (ind Ona Tae 


Stephens (1974) gave the upper tail percentiles (critical region) 
for these procedures and Stephens (1976) expanded the available 
percentiles and included some in the lower tail. Pettitt and 
Stephens (1976) adapted the EDF procedures for censored samples. 


They gave percentiles for wo i and v2 when censoring is in 


upper, lower or both tails for the normal distribution with 
parameters unknown. 


Stephens (1974) showed by Monte Carlo study that the power 
of the Anderson-Darling procedure was about as good as the W 
procedure and recommended it because there were no special con- 


* 
stants required to use the test and the percentiles of ny are 


just a simple function of sample size and hence no extensive 
table is needed. 


The Kolmogorov-Smirnov procedure is defined as 


D= vn max | =_F(y_) | 
lgr<n . 


where r is the rank of the observation, 
<y_<-+ee< 
¥15¥5 y, ° and 
F(y_) is the transformed variate. 
Exact distributional results for the Kolmogorov-Smirnov tests 
for the parameter estimated case were obtained by Durbin (1975) 


for the exponential distribution. Percentiles for sample sizes 
from 2 to 100 were given. 
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The use of the Kolmogorov-Smirnov procedure is not recom- 
mended since in almost all studies it was shown to have poorer 
power than either the regression tests or the Anderson-Darling. 


Stephens (1977) used the same approach as in his above 


cited papers to determine the percentiles of way ae and ve 


for testing for the extreme value distribution and Stephens 
(1979) covered the logistic distribution 


F(x) = [1 + aepte(x- 2) /ey 17+. 


A multivariate normal version of We was developed by 


Malkovich and Afifi (1973). Pettitt (1979) used the we pro- 
cedure as a test of bivariate normality. 


2.38 Spectfic Property Test. This section describes a number 
of tests which make use of a specific property of a distribution. 
These tests are special in that they cannot be generalized to 
other models. Two of the better known tests for the normal dis- 
tribution are based on the fact that its standardized third and 
fourth moments (skewness and kurtosis) are zero and three, res- 
pectively. The estimators of these moments have been used as 
separate tests for normality. These are 


where y is the sample mean. Recently approximations to the 
distribution of each of these statistics were obtained by 
D'Agostino and Pearson (1973) and D'Agostino and Tietjen (1973). 


These authors used a Johnson distribution to approximate the 
distribution of the statistics (Johnson distributions can be 
expressed as transformations of standard normal variables). 
D'Agostino and Pearson (1973) suggested, in an attempt to improve 
the power of these procedures, that the two statistics be com- 
bined into one omnibus test. They suggested using the statistic 


cs x7 (vb ) + x" (b ) 
1 2 
where x*fe9 was the square of the transformed Johnson variable. 


Thus, K? is the sum of squares of two standard normal variables; 


however, since Yb, and b, are not independent it is not 
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distributed as chi-square with two degrees of freedom. Bowman 
and Shenton (1975) provided a series of contours of the 90, 95 


2 
and 99 percentiles of K. 


Another property of the normal distribution that has been 
used in test construction is that its entropy exceeds that of 
any other distribution with the same variance. Vasicek (1976) 
suggested the statistic 


n 
n 1/n 
Koo —— PIG eS y, 2) 
m,n mo i=1 ae = 
“aw n P= 
where a7 = } (y,-¥)"/a, 
i=1 


m is a positive integer less than n/2, 


< 
I 


1 if im<l, 


= i i < Seer 
Vid ehes -¥~ if itm 2n, and Vy Yo 5 
The author used Monte Carlo procedures to obtain the percentiles 
of KX is for n< 50 and m=1(1)5. Vasicek (1976) did a 


> 
Monte Carlo study of the power of the procedure for sample size 


20 and m=3 and showed that the power of K, »9 Was similar to 
> 


k 
that of both % and W. He also suggested for optimum power 
that m= 2 be used fur n= 10, m= 3 for n= 20 and m= 4 
for n= 50. 


Another special property procedure developed as a composite 
test for the gamma distribution was suggested by Locke (1976). 


Using the fact that if x) and Xo are independent, non- 


degenerate random variables then x : Xo and x, /x, are inde- 


pendent if and only if they are gamma variates with the same scale 
parameter, Locke suggested forming n/2 bivariate pairs [U,V] 


h U = j — 
where U, Xi + x,, and Wy max[x,,_,/%54> X54 /%54-1]+ He ther 


uses Kendall's Rank test to see if U and V are independent. 


This is one of the few procedures available for testing the 
composite hypothesis for the gamma distribution. 


A variety of test statistics have been developed for the 
exponential distribution. Defining a gap as the difference 
between two successive order statistics Yy and Yep it is 
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well known that for an exponential distribution 


BASHe<o2A(n-144) (y}=ys 4), Gt = 1525-2905 


where Yoo 4-s minimum value of jy, was Yo Keen va? 


are independent (2) variates. Gnedenko et al. (1969) 
suggested using as a two-tailed test for (one-parameter) exponen- 
tiality and later at a an Fercho and Ringer (1972) 


¥(r) = (n-r) 5 s,/x( 5 S,)> 
i=] ere 


where Ss) =ny, and r is a chosen integer. If yw is the 


origin then S) = n(y, =i lire 


Since 2S ,'s are independent 7 (2) then Y¥(r) has an 


F(2r,2(n-r)) distribution. A modification of this procedure 
was suggested by Harris (1976) which compared the gaps in the 
tails to the middle. He recommended 

r n n-r 
Y'(r) = (n-2r)[ ») S, + ) S,d/2r y Ss 


i=l - isn-r+1 : i=rtl 


i. 
Another version of this procedure was suggested by Lin and 
Mudholkar (1980). They suggested keeping each tail comparison 
as one as ade of a 5 sete oan statistic, i.e., treating 
n n-r 
= (n-2r) 5 S ,/¥ 7s Ss; and Fy = (n-2r) ) S,/r } S; 
fa i=rt+1 * t=n-rt+1 i=rt+l ~ 


as a bivariate F distribution. 


For a composite two-parameter test for exponentiality Ss) 


can be completely deleted in the above suggested test statistics. 
The affected terms would lose two degrees of freedom. 


The power differentials of the step and leap procedures are 
small. Neither the Lin-Mudholkar procedure nor the ¥'(r) 
procedure appear to improve the Gnedenko gS ale Lin and 
Mudholkar recommend for optimum power, using r [n/10] in their 
procedure while r = [n/2] is generally used with Very and 

= [n/4] with Y"'(r) where [-] is the greatest integer 
symbol. These tests can be used with censored samples since Ss; 


depends only on the two adjacent readings and is an independent 
chi-square two-degree of freedom variable. 


14 S. S. SHAPIRO AND C.W. BRAIN 


Mann et al. (1973) used a similar rationale to develop a 
test for the extreme value and Weibull distributions. They used 


the fact that 
= _- =— d i= tele =] 
2h, 2(0y 544 ¥ /EQ say y,)> i Ta on 


is asymptotically independent y7(2). Thus, the test statistic 
for a sample of size n censored at the mth of n observations 
m-1 s 
L(x,s,m,n) = (1/r) J &,/(1/s) 2. , xtstl<men 
i 
i=m-r i=1 


has asymptotically an F(2r,2s) distribution. They then suggest 
an improved form 


S(r,s,m,n) = (r/s)L/(1+(r/s)L). 


They gave percentiles for S and showed that the power of S 
was better than that for other available tests. 


3. CENSORED SAMPLE TEST 


In developing a procedure to be used with censored samples 
the regression of the ordered observations on the expected value 
of the order statistics will be used. It is possible, with a 


censored sample to obtain two estimates of on. One can be based 
on the slope of the regression line, as is done in the W test 
procedure, and the other on the residual sum of squares about 

the regression line. However, in this case we no longer have an 
analysis of variance rationale since both statistics are only 


proper estimators of 0° under the null hypothesis. The W* pro- 
cedure for censored samples compares these two estimates in order 
to assess whether the hypothesized model is appropriate. When 

the null hypothesis is false the residual sum of squares will be 
inflated (due to the poor linear fit) while the slope will not 
change in a systematic way; it will depend on the alternative 
model. Thus, the power of the W procedure should be better 


than W* and hence the W _ test should always be used for com- 
plete samples. 


The null distribution of W* will depend on the null hypo- 
thesis, the actual number of observations and perhaps the sample 
size. Sections 4 and 5 present some results for the exponential 
and normal distributions. In what follows it is assumed that the 


n-r smallest observations in a sample of size n have been 
recorded. 
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Following the rationale of Shapiro and Wilk (1965) let £(x) 
be a standardized density function (location parameter zero and 
scale parameter one). Let X' = (x) »Xy9°'*>X) be the vector 


of order statistics of a sample of size n, and. 
' = al = eee 
be its expected value. Also let 


Ne (v, d= B[ (X-M) (X-M)"] Le ei 2 ae 


j 


be the corresponding cc'variance matrix of the order statistics. 
If a linear transformation 


ge Ox, sh Ui i= 1,2,:*-,n 
is made with oOo > 0O and -~ < pp < © then the order is preserved 


and Vy < Yo <sieie< y, are the corresponding transformed order 


statistics. Also 
E(y,) = om, + £-=7152,°°*,n 
2 i 
and cov(y;>¥,5) = 0 Yay iepe= L2.°°e ow . 


Writing Y' = (¥p2¥qr° to ¥,) and 1' = (1,1,°*+,1) it follows 
that 
E(Y) = ul + oM, E[Y-wl-oM) (y-y1-om)'] = o°Vv. 


It is well known that for a positive definite, symmetric 
matrix such as V, it is always possible to find a non-singular 


~ 


nXn matrix T= (t,5) such /thatwir VoRiaad. 3 LettsZi= T 4 


where z! = (2, sZostttsZ)> Then 
E(Z) = T(w1+oM) = uC + oK 


2 
where C= T1andK=tTM and cov (Z) =o I. 


The matrix T can be chosen so as to be lower triangular 
and T will be unique if we require the diagonal elements to be 


positive. Thus, 


BE Ty? 2g 7 By Ty T tye 5e ete: 
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Because z,'s are uncorrelated and have a common variance 


0”, a simple least squares regression of the elements in Z on 


the elements in K and C is forced through the origin so that 


the first and second partial regression coefficients provide 
estimates of o and wu respectively, and the residual sum of 


: 74 A 
squares provides an estimate of oO . Denoting these as b, a, 


and ae respectively we have for a censored sample of n-r 
observations the following estimates: 


n=-r nt 2 | Diep n-r 
d 74 Ey ) Ci Ps d a1 7 ) 24 ot 
n-r n-r 5 n-r n-r 
, Zs Cy ) k - ) aS k, L c, k, 
as a (4) 
n-r m-r 5 n-r 2 
a= = 
be k [ ) c,k,] 
i i ab 
2 n-r » n-r n-r 
s = - = 
) Ze b ) z. ky 2) ) Zz, C55 (5) 
i i i 
The W* test is defined by 
W* = eye (6) 


The statistic (6) is origin and scale invariant, hence, it 
can be used as a composite test for f(y;y,0) families. The 
maximum rk of W* is infinity which occurs when y, =m, 

a a 
since s = 0 and b= 1. Thus, when the observed values equal 
the expected values the W* statistic is infinite. Thus, it 
would be expected that W* will be a lower tail test. 


The minimum value of W* can be obtained as follows. Since 
the statistic is scale and origin invariant we can determine the 
minimum value by letting a= 0, b= 1 and finding the values 


eae 2 i 
of Z that maximize s subject to the above constraints. The 


constraints a= 0, b=1 imply that 


zZ 
irae = 
zk, ak and Y2.c, = tc k,. (7) 


+ 
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Thus, we want to maximize 


subject to the constraints (7). Since the k,"s are constants 
F 2 he oe 
we need only maximize iz. This is a convex function which 


assumes its maximum at one of the vertices of the (n-2) dimen- 
sional hypersphere which satisfies the constraints, Therefore, 
by testing each vertex the maximum value of Eas can be found 
and the minimum point of W* be computed. 


4. APPLICATION TO THE EXPONENTIAL DISTRIBUTION 


Let £(x) =e 3; x > 0. The elements of M and V_ take 


on a simple form for the exponential distribution. Using the 
results from Kendall and Stuart (1961) and the relationships 


given in Shapiro and Wilk (1972) the quantities b, a, ee in 


(3), (4), and (5) reduce to 
n-r 


Tog t ee 8 
abe Gisr=1) 
% i baad 
toe | ) tt hea. 


ot SF nia Sac OBE NAS E (9) 


n-r-1 
n-r-1 
Do) 
s* = } [2(n-i) (n-i+1) +1 Jy? + (r+1)"y (10) 
* 1 Ji hed 
i=1 
n-r hie 
-2 J y,y,,(n-t#1)? = b( J y,tty_) -an’ y,. 
; i i=l , i n-r 1 
i=2 fies]! 


D pees ty 
The statistic is defined as W* = b/s’. The minimum value can 
tN. Z ; ; 
now be found. We wish to maximize Zs subject to the constraints 


of (7). Now since 


2 = 
De, k, = tz, A rk. =n-r , uz, ol ot ie i ky 


the constraints of (7) can be written as 
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Yz. =n-r and z, =1l. 
i il 


The vertices in terms of Z that satisfy these constraints are 
(1,n-r-1,0,-++,0) 


(1,0,n-r-1,0,+**,0) 
(lL, 05° 5 n=l). 


Thus, ee = Dey - Dke = 1+(n-r-1)7 - (n-r) and the minimum value 


of W* is 
-1 
Lee = [(n-r-1)(n-r-2)] °. 


The distribution of W* depends only on n-r and not on n. 
This can be demonstrated when n-r=3. 
Theorem. The distribution of W* for n-r=3 is given by 


fe 


gti) = (wine ewe I 


Proof. Using equations (8), (9) and (10) with n-r=3 observa- 
tions 


b= [(-n)y, aye Sh (n-2)y,]/2 


ret) 
I 


= [(3n-1)y, - y, - (n-2)y,]/2n 
and 
“ 2 
s = [(n-l)y, - (2n-3)y, + (n-2)y]°/2 


where the y, are the n-r smallest observations from a sample 


of size n from a standard exponential distribution. Make the 
transformation 


be} bs Yy 10 oe) + (n-2)y, 


(i-n) Yor | n-2): 
a PLS ae See eeR 


c 
I 


3 (n-l)y, = (2n-3)y., + (n-2)y,. 
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n 
ere 
E(¥¥o°"*Y,) =Ke 0< Yy < ¥, <eeced fe < © 
-{y,t+y,+(n-2)y,} 
gly,ysy,) =k, e  * ? 0 < thisty stg th 
iis\paees 1 Vy Va Ve 
= 
h(u.u,u.) = K. e : ORS4 ele <aouey < < © 
ash ase 2 aE esas 
~2u, 
k(uju,) = K, e 0< Ju,| x uy 
2u, b2 
Let W a oo a a Integrating out the extraneous variable and 
Us s 
and evaluating the constant yields 


g(w) = (20) 73/2 » w> 1/2. 


Thus, we conclude that the distribution for n-r=3 is independent 
of n and only depends on the number of values. We conjecture 

at this time that this property is true for other values of n-r; 
but we have not been able tto derive the exact distribution for 
samples where n-r > 3. 


5. APPLICATION TO THE NORMAL DISTRIBUTION 
This section uses the censored test rationle for the normal 
distribution. The matrix T needed for the transformation 
Z= TY, has been studied by Shapiro and Wilk (1962) where they 
notes dhat T » a4 lower traingular matrix, is approximately 
double diagonal. Thus, the Zs5 i = 2,3,°°*,n, are represented 


by the weighted difference of two of the ordered observations. 
The z,'s are approximately uncorrelated normal random variables. 


The test statistic is 
W* = 4/57 (11) 


and equations (3), (4) and (5) are used to calculate the required 
quantities. The distribution for n=3 can be obtained as 
follows. Using equation (3) it can be shown that 
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2 “es 
Pa ak ask BRA le 


b , (12) 
om’ tz, (eeh-be~) 
Je 3a te oes 
Simplification yields 
b= (y, - y,)/2m,- 
Likewise it is easy to show that 
2 2 
2 
EEE Pe gee MEE ea Se a 
2+ 2kt,,) 2 


k “= Geechee, ee 


using the fact that 


m, = Fs ; m, = 0 
2 i ispeay 
sg i vbiling 7 igglichc ba Ss" 


f a3 “a9 * ©o1. "99. “a5. ae 
2 2 
i: fu Fie alae: Vl 


Now applying the transformations 


2 


V3 dies yy te y,t Y3 


3 


Teak 3 oho 


a] 
oS 
I 


eg ANG) Sy 


and then changing to polar coordinates and integrating out r 
yields 


£(0) = 3/n, es Q < 2n/3. 


u) 2 
Letting W= 55g tan §@ then 
"3 
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fac 8 ~-1..-1/2 
g, (W) es (W+1) ~ W reas oa toi eee 


To obtain the distribution of W* we simply note the relation- 


ship between "2 and = and e and is and obtain 


2 3 
gy (We) = A2222739 (4) we M/2 . 362 < We < @, 


To check the minimum bound we know that we must minimize Eze 
a 


subject to the constraints of the problem and this occurs at a 
vertex subject to the constraints. An easy way of checking is to 
use the constraints b= 1 and Vy < Yo < Y3° Taking the point 


COE Zo Z3) using (12) and noting Zz, = tay yy and hence ee 0 


and y= 2m Then using (13) 


3° 


Ps 


s- = K(O - 2y, + 2m,)° = 4K(m, Se Ai ¢ 


»° 
This takes a maximum subject to 0 < Yo € 2m, when Ming O02 
Thus the maximum value of a is Boy = 4K mn = 2.76 and from 


i515) a = 0.362 which agrees with the prior result. 


The distributional properties of W* are quite difficult to 
obtain and it is no longer possible to demonstrate that the per- 
centiles of W* don't depend on both n and r_ even for n=3. 
However, examination of the Monte Carlo percentiles which are 
unpublished indicate that this might be the case. For example, 
for n-r=10 the 5% points for n= 10(1)20 are _ .068, 
POGG.I7.066.. £068.00. 065.2..07155.065," .061, 064, 2065, .061, and for 
n- r= 5 the corresponding 5% points for n= 5(1)10 are 
Se (Co ge, a a Me YE a A The 


Further work will be done to determine other properties and 
obtain an accurate table of percentiles. 


6. CONCLUDING REMARKS 


This paper has covered some of the more recent tests for 
distributional assumptions. The coverage is not complete, for 
example the oldest procedure, chi-square goodness of fit test has 
been omitted. The authors believe that the procedures cited 
are among the most powerful available and for the most part can 
readily be calculated. The users choice of a particular proce- 
dure is a personal matter and should depend on the availability of 
a computer program in order to do the calculations or the 
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understanding of the individual as to the particular theory behind 
the test. For most of the competitive procedures presented the 
power differentials are small. The authors would strongly urge 
all users of such tests that they augment these analytical pro- 
cedures with a probability plot. The plot can better describe 

the data than a single test statistic and the two together pro- 
vide a good method for analyzing data. 
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A GOODNESS-OF-FIT PROCEDURE FOR TESTING WHETHER 
A RELIABILITY GROWTH MODEL FITS DATA THAT SHOW 
IMPROVEMENT 
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SUMMARY. Reliability growth models that have been discussed in 
the literature generally assume that an item (such as a piece of 
electronic equipment) is tested in stages a given number of times, 
n, (sayyat the ith stage, te,2, 7, k) If, at the ith 

stage, x, successes are recorded (and hence n, -— X; fail- 
ures), the problem is then simply to estimate Py» the probab- 


ility of success at the ith stage, i=1,2,...,k. It is 

assumed that stages are independent of each other and of time. 

In this article a general parametric reliability growth model is 
proposed for Py and a goodness-of-fit test based on the likeli- 


hood ratio criterion is proposed. This test is carried out on 
data that show reliability growth in stages to determine the 
adequacy of the fit of the proposed reliability growth model. 


KEY WORDS. likelihood ratio test, maximum likelihood estimators, 
stochastic expansions, chi-square statistics. 
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1. INTRODUCTION 


Consider an item (such as a surface-to-air missile) that 
undergoes testing in independent stages, where at the ith stage 
of testing there are n; items placed on test x, are successes 


and n, - xX; failures. Thus, at the ith stage (assuming 


Bernoulli trials within each stage) the probability that there 
are exactly x5 successes and n, - x; failures is the binomial 


models; i.e., 


Ares ae 
pr{x=x, } = Noe. | Be (1-p,) ’ (1) 
i 


x= O,1,.--.n,,5 wherein P; is the theoretical probability 


that any item under test at the ith stage is a success and 1-p, 


is the corresponding failure probability; i=1,2,...,k. 


The object in testing the item in stages is to allow further 
development and design improvement based on test results which 
will increase the probability of success from stage to stage. 

The ultimate goal is to design a reliable item. 


If it is assumed that the probability of success increases 
from stage to stage, subject to random fluctuations, it is 
important to propose and develop models to assess (estimate) the 
item reliability at and after each stage of testing and to pre- 
dict the item reliability of future stages. 


Lloyd and Lipow (1962), Gross and Kamins (1968), and Gross 
and Clark (1975) describe two parametric reliability growth 
models: the hyperbolic and exponential growth models. These 
models are described, briefly. The hyperbolic growth model 
defines P; as 


Pf Ps 7T uy shes (2) 


where p,, 0 < P, < 1, is the ultimate achievable success prob- 


ability, and «> O is a second parameter that quantifies growth 


between stages 1 and k. The exponential growth model char- 
acterizes P, as 


Py IF 1 - oT exp (-a,1) (3) 


ee 
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with 0O <4, <{1>eband Oy > 0 as the parameters of the model, 


i=1,2,...,k. The parameter Oy quantifies growth whereas 
1 - a, is the minimum hypothesized success probability for this 


il 
model. 


The parameters of the models (2) and (3) are estimable 
through the method of maximum likelihood (m.2.). A detailed 
discussion of the estimation procedure, complete with numerical 
examples, is found in Gross and Clark (1975; ch. 5). Methods 
are also developed for predicting the lower confidence limit of 
Piety? the success probability at state k+l, given k stages 


of testing. 


2. REVIEW OF RELIABILITY GROWTH MODEL LITERATURE 


There is an ever-growing bibliography in the area of relia- 
bility growth. No attempt is made to give a complete listing of 
all published work in this field; however, some of the more 
important articles are cited below. 


Lloyd and Lipow (1962) provide one of the earliest discus- 
sions of reliability growth models in which they include the 
hyperbolic and exponential models, (2) and (3), respectively. 
Other parametric growth models are considered by Berndt (1966), 
Bresenham (1964), and Zellner and Lee (1965). Gross and Kamins 
(1968) and Gross and Clark (1975) develop methodology for esti- 
mating the hyperbolic and exponential reliability growth model 
parameters. Their method as previously stated employs m.%. 
estimation procedures. Furthermore, Gross and Clark demonstrate 
(in Chapter 5) how the logistic model may be viewed as a relia- 
bility growth model and proceed to apply the logistic model in a 
reliability growth setting. 


Nonparametric procedures have also been considered in devel- 
oping reliability growth models. A key paper in this area was 
written by Barlow and Scheuer (1966). Their model assumes an 
item that fails, fails according to an assignable cause that can 
be corrected while the item undergoes testing; or the item fails 
due to an inherent cause that cannot be corrected without major 
redesign of the item. They then obtain the (nonparametric) m.&. 
estimators of the state-to-stage success probabilities Py» 


i=1,2,...,k, under the constraint that p,'s forms a non- 


decreasing sequence. The basic problem in obtaining constrained 
m.&. estimators is first treated by Ayer et al. (1955). 
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Bayes methods have also been used in the development and 
analyses of reliability growth models. Cozzolino (1966) uses 
Bayes procedures in making minimum cost decisions on testing 
times and burn-in procedures for a general class of reliability 
growth models. If the parameters of a reliability growth model 
are assumed to be random variables having appropriate (prior) 
density functions, Pollock (1968) based on this assumption 
develops a model to project item reliability at some time after 
the start of testing. The model then predicts reliability of 
the item after the failure data are observed. Dahiya and Gross 
(1974) extend an earlier model, Gross (1971), by developing an 
empirical Bayes procedure to assess reliability growth of an item 
tested in independent stages. Finally, in the spirit of Bayes, 
Weinrich and Gross (1978) use a Bayes procedure that utilizes a 
Dirichlet prior density function on the two failure probabilities 
and the success probability in the Barlow-Scheuer model to 
improve upon the estimates of reliability at each stage that were 
obtained by Barlow and Scheuer (1966). 


3. GENERAL PARAMETRIC GROWTH MODELS AND TEST PROCEDURES 


Let p; (a0) be the theoretical probability of success of an 


item at the ith stage of testing where Qe = (25 9% s+ ++ 20) igaa 


vector of parameters and i=1,2,...,k; k > r. The exponential 
model is an example of the case r = 2, 


P, (042%) =l1- a, exp (—a, 1) (3) 


i=1,2,...,k, and the parameters Oy and a, are the parameters 


of the model. The unrestricted probability of item success at 
the ith stage is Py (independent of Qa). 


A test of the null hypothesis Ho: pyae p, (a) against the 
So 
alternative H,: Py # P, (a), Py unrestricted, is carried out 


by the likelihood ratio criterion. Thus if 


ag ny =x 

~§ 

Py (Q") qy 

i x, n,-x ? (4) 
> al 


where @' is the m.2. estimator vector of a', p. = 
r ach a i 

m.2. estimator of ; ony a Toe ant Sigur a 

Pas 9548) Sheep Gan cane a eee Pjs 
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i=1,2,...,k. The statistic -2 In A is approximately distri- 
buted as a chi-square variable with (k-r) degrees of freedom 
when Hy is true. Thus, Ho is rejected at level a only if 


2 
reat ME Neat Tmo: ? 


2 is the 1-a)100 ercentage point of the chi- 
where XK. ay (1-2) P ge Pp 


Square distribution with (k-r) degrees of freedom. 


In order to simplify the test procedure, -2 In A is now 
studied more closely. 


= = “1 
Pelt i 2 aa x; In x /{n,P, (qu) } 


+ 2 2, (ny- x,) In(n, - x. > Jin. 14, @)}. (5) 


Let d, =X, — 1; Py (2"), Lal 2 ge ots kee Lhen, wit. toldlows rthat 
= (n, - x, D mpc ye ty (at ye etThuse 6-2 ine a (n, - x,) 
as = -l] 
In{l + a h(n, x,)}! +2 E, x, {In(1-d,/x,)}-}. 


Theorem 1. Statistic W where 


ate 
W, = 2, dy (np, @)}> + {n,a,@0I71, (6) 


is asymptotically distributed as a chi-square variables with k-r 


degrees of freedom, when Ho is true. 


Proof. Nic pens the stochastic ire: of iIn{fl+d a/ 
(n, > ae ues and In(l1 - d, Ix, ee . It follows that, 


an Nese or Xs {A + a? /2 aL: a?/3 + +++} 
2 3 
te Siam oe x ABCCROB (2 Re Scat ss 5} 
ab - 1 


where A = d,/x,; and B= d,/(n, - x;)- Thus, one finds 
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2 -1 -1 
2° Tnehn= a di \x, + (n, x,) | 


a 
+ terms in a d, 


a > 3. Now, for each a, 


-(a-1) 


+ (n,; -x, > a i 


2 
ae d, |x, 
° iP 
ile [afr cay re® sat a5 /(n,-*,) . {d,/(n,-x,)}” | : 


Consider the term d,/x,; which can be written as 


~ 


_ fain, - PQ} {P,@) - py @D) 
si ccnirS _ x, / : 
aly BE si ae 


2 
When Hy is) €rue, tax. iy sd | (gt) and p, (at ) " P, (Qe Thue, 


d,/x, g 0% sdanera my Ried -x, D> & 0. Furthermore, 


2 as 2 Ay ae: Ay 
d,/s, = dfn, p.(a") + d.h= d./{n, tp, (’) + diya ie 
Since d, /n; ; 0}, Ear fn pilarge a he J d,/n.p.(Q') and 
i Se Se ete £oeeee 
' ! 
: y 494 (Sp) > Both d5/n,p, (0) and 
d;/n,4, (21) are ater dd wee probability 1. Thus, 


Hnpiaagh Me ‘a, -x, oye d2 {/n 


ry an bene + (a, -xpgen td B O08 as a 


Let W, = -2 in A. . Thus, :as. no“ ©) P=, 2, ke ered e cere 
1 i 
bution of Wy is chi-square with k-r degrees of freedom when 
Ho Ls: Cruel Op oeDe 
Note that W, is analogous to the statistic for goodness- 
of-fit that Berkson (1953) introduces. Note that no rationale 
is given for the Berkson statistic. Such a rationale is the 
raison d“étre for this article. 


ss 


EE 
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A generalization of Theorem 1 is obtainable. Suppose the 
null hypothesis under test i : = —— 
yp under test is Hp): pj, P, (Q2,Q0 7S) vs. HA: 


Py = P, (qi). That is, the test is now that r-s of the par- 


ameters of the model P, (Qt) are specified under H, as 


0 
opposed to the alternative that no parameters are specified, 
d= 1, 20 ths ek ocr > 8% 


a ¥ A fe) A = a 
Theorem 2. Define Pig = Py Co" ate orrtand P, = P, (at)s 
= Zed sks a lhnenstatishic Wo is asymptotically distributed 
as a chi-square variable with r-s degrees of freedom when Ho 


is true; where 


x, (P; - Pi) (3p, ia (n,-x,) (45 5744) (4, 57344) 


= io 
Wo = ar fam <9  a ~ 

Pio Tio 
dose 1- Pas 45 = 1- Pio} na? a epee e oe 


Proof. Similar to that of Theorem l. 


4. AN EXAMPLE APPLIED TO THE HYPERBOLIC RELIABILITY GROWTH MODEL 


The hyperbolic reliability growth model, given by (2) was 
fitted to the following clinical trial situation that it dis- 
cussed by Gross and Clark (1975, pp. 173-176). Patients with 
acute leukemia were tested in five stages, 10 new patients 
selected at each stage. Between stages, medication was modified 
so that the proportion of patients achieving remission from stage 
to stage should increase. The proportions of patients achieving 
remission at each state are 3/10, 6/10, 7/10, and 8/10, for 
stages 1 through 5. The hyperbolic reliability growth model, 
(2), was proposed in this case. Gross and Clark show that the 
m.&. estimates of p, and G are, respectively, p, = 0.8891 


and aL = 0.5892. This leads to the following table of numbers of 
observed and expected successes at each stage. 
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TABLE 1: Observed and expected number of successes at each of 


five stages of the leukemia clinteal trial. _ 


ane eee ee eS 


Observed Expected Successes 
Stage Successes (Hyperbolic Model) 
a 3 2.999 
eZ 6 5.845 
3 7 6.927 
4 7 7.418 
5 8 peer e e) 


a ee SEE aS Sea SSE SSSERSSER SSR 


When Hy is true the statistic Wy is approximately a chi- 


square variable with three degrees of freedom. Now 


(3-2.999)7[(2.999) + + (7.001) 77] 


Wi 
+ (6-5-9145) 1151945) COsai a) 
+: (7-6.927)7[(6.927) - + (3.073) -] 


¥ CITAAIB) ICAI ee eae 


+ 


(827.723) °[ (7.713) ~ & (2.282 | = 06140. 


Now re 0.90 ~ 6.25, indicating the fit of the hyperbolic model 
ale 


to these data is excellent. 
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CHI-SQUARE GOODNESS-OF-FIT TESTS BASED ON 
DEPENDENT OBSERVATIONS 


KAMAL C. CHANDA* 


Department of Mathematics 
Texas Tech University 
Lubbock, Texas 79409 USA 


SUMMARY. An attempt has been made in this article to investi- 
gate the sampling properties of standard goodness-of-fit chi- 
Square tests based on interdependent observations which are 
obtained from a strictly stationary and strong-mixing random 
process. We conclude that the null distributions of the test 
statistics are largely determined by the multivariate probability 
structure of the process. A few examples are used to illustrate 
this point. 


KEY WORDS. goodness-of-fit tests, chi-square tests, strictly 
stationary and strong mixing processes, linear processes. 


1. INTRODUCTION 


Let {X.: t = oe be a sequence of independent and 


identically distributed (i.i.d.) random variables (r.v.), with 
a common distribution function (d.f.) G. If one wants to test 
various kinds of hypotheses H concerning G, based on a sample 


XporeeeXis one would, probably, like to use the classical good- 


ness-of-fit tests based on such statistics as the chi-square 
statistics, Kolmogorov-Smirnov statistics, or other statistics 
which exploit the characterization properties of G umder H. 
The simplest and the most commonly used among these are the chi- 
square goodness-of-fit tests of which a comprehensive account is 
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available in Watson (1959). Problem arises, however, when it is 
known a prtort that the observations Xypoeee ek, are not indepen- 


dent but G is still the d.f. of Xi» implying strict station- 
arity £or {x}, in which cast it is essential that some infor- 


mation regarding the structure of dependence of the sequence 
{x, } is available, before an appropriate chi-square test can be 


formulated. If we use the standard chi-square statistics in such 
situations, the resulting tests might have properties which are 
substantially different from these expected, for both small and 
large samples. Bartlett (1950) and Patankar (1954) have illus- 
trated this point in some detail for Markov chains and stationary 
Gaussian sequences when H is simple. 


Since a meaningful probability treatment of strict station- 
arity will, possibly, require some assumption under which obser- 
vations far separated from each other are nearly independent, we 
shall assume throughout that {x, } is either a strong-mixing 


process with mixing coefficients a(v), (v=1,2,...) (for 
definition of such processes see Rozanov, 1967, p. 180) or a 
co 


linear process defined by Xx, = ) g W ats where {Wh t = 0, 


+1,...} is a sequence of i.i.d. r.v.'s and the infinite sum con- 


verges to xX. in some stochastic sense. There are features of 


similarity common to these two processes but a linear process is 
not, in general, strong-mixing, although it can be so under some 
additional conditions (see Chanda, 1974, and Gorodetskii, 1977). 
These two processes together constitute possibly the widest such 
class of strictly stationary processes encountered in the liter- 
ature. 


We further assume that G(x) = G(x;98), where 6 isa 
parameter vector and that G is absolutely continuous with a 
probability density function (p.d.f.) eg. The support of G is 
the interval (a,b) which is divided into r class intervals 


“2 = (a5_498,)> PE SS a ager Bob BE b). We consider two 


situations. (I) G is fully specified under H, in which case 
the class probabilities are given by 


(II) G is fully specified under ,H except that @ is unknown 
and is estimated by a statistic 9 in which case we estimate 
the class probabilities by 


+ ont matcngsingee BY 


a a 
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a Gla,58) - ley _132)5 (l<j<r). (2) 


Sections 2 and 3 contain the main body of results followed 
by general discussion. The proofs of the results stated in the 
form of theorems and comments are withheld and will be reported 
elsewhere. 


2. CHI-SQUARE GOODNESS-OF-FIT TESTS 


In the classical tradition we define the chi-square 
statistics by 


mo 
I 


5 
») (n, - np.) /np, in case (I) (3) 
tee! 


(n, - np.) /nB, in case (II) (4) 


where Rs = number of observations Xx. in the class interval By? 
Lie ear) 


Zeal. case is P; (1 < 4 < xr) are known. The following theorem 


. 
= 


describes the asymptotic behavior of the chi-square statistic. 


Theorem 2.1. Let x? be as defined in (3), and let {X.: t = 
O,4+1, cx.F be a strictly stationary strong-mixing process with 
a(v) = BGs oh for some ¢€ > 0. Assume that P, ee 


1 st} <r; for some c> 0. Let A= [A 5] where 


iz M4 ’ 
Pee © 


-1/2 
A, = [p;;(v) - P,P; 1(P;P;) : (5) 
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38 
= - y=0,+1,..., 
Pi (v) = P(X, CB,, X35)» v=0,+1, 
P45 = 
0 +f if Ff. 
Then, as n - 
fr-1 
L(x) Fo ds ) 2 > (6) 
ets 


where % are mutually independent r.v.'s each with a chi-square 


distribution with 1 degree of freedom and Tyoese eT 420 are 


the eigenvalues of A. 


Theorem 2.1'. Let {x } be a strictly stationary linear process 


defined by X, = ) gW where {W.} is a sequence of i.i.d. 
Ce t = 
r.v.'s with E(|W,|°) <o for some 6 >O and } viel =. 


v=0 
AX = 6/2(1 + 6). Let the remaining conditions of Theorem 2.1 be 
satisfied. Then (6) holds. 
Observe that although X56 (1 < i,j < r) are known 
because P,'s are given, the quantities A, Cw i034 = 2,4 
<r) are not usually known. If these are given as additional 


conditions under the hypothesis H then we replace x by 
Xe" mb 7 (7) 


where ie is the (unique) g-inverse of A (for definition of 
g-inverse see Graybill, 1976, p. 24). It can be shown that 


L(x?) + LOZ (a)), (8) 


where x" (4) has a chi-square distribution with d degrees of 
freedom, d= rank(A) < r-1l. 


en 


a Ml 
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Note that if {X_} is a sequence of i.i.d. r.v."'s then 
i = T= pp “where p*. = ey eee ts so that T, = 1, 
l<j<r-l, and d=r-1l. It follows immediately that 


AC 
and since p'Z = 0, we have that xx? = ZN od = Z'MZ = Z'Z = 


~ 


=A 
x2 


In general, it is difficult to compute me even if analytic 
expressions for P45) are known. For example, if {x, } is 


Gaussian with zero mean, unit variance and autocorrelation func- 
tion p(v) then it is well known that for v # 0, 


Cc 
a) o at <t< <u<o 
P44 2 fe) tic ead Oe where ae Cis tf <r, Ono ) 


are given in Section 4. Even if p(v) is completely known uae 
is hard to compute. It_is, therefore, desirable to find out if 
there exist estimates “ee (of wee which can be used to 


modify x so that this new statistic has a simple asymptotic 
null distribution. The result of the following theorem shows 
that such a task is possible to accomplish. 


Theorem 2.2. Define 


5 =i Tv’ 
43 = @-v) 2 ¥,(t)¥, (t + v) (9) 
A aiov) 7 51> 
Y, (t) = [1, (x) - p,1py Cleat ee ret fie a) 


1 te = xe 


I, (x) 


0 otherwise, 


0 Sy 22-1, byte Yr. “Let {2} be a sequence of positive 
integers such that an > co but 2 /n SrOQreeace ir and Let 


the mixing coefficient a(v) satisfy the condition of Theorem 
2a.  Detine 
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~ 


where A is the g-inverse of A. Then as n>+® 
en? 2 
E(xs") > L(x (a)); (11) 
where d = rank (A). 


Theorem 2.2', Let {x, } be a linear process satisfying the con- 


ditions of Theorem 2.1'. Then if the remaining conditions of 
Theorem 2.2 are satisfied (11) holds. 


Pai oR 1M: P, (1 <j <r) are estimated. We now consider 
the situation where Py are unknown. We assume that @ isa 


parameter vector with s elements (s <r). Let § be esti- 


mated by the estimator 6. We assume that 6 = nt? 29) 
=i 2a. 
= ‘= 
n ) H. + ee where He h(X,) (h [h,,---,h,]) for 


t=1 en 
some function h, Clos 4xs 58) E(H,) = 0 and E(| |, || cee 


's= i —} 
for some. > On euet res [Y, (t),...,¥ (t)]. Write Pay 
-1/2 
r) = ' = a 
( p,/00,)p, ’ P [p5 51) E(H Yi) A(v) la, A), 


E(H. >Hi) = R(v) = [p,,(v)]. We then have the following result. 


Theorem 2.3. Let the conditions of Theorem 2.1 hold true and 
let IP, ,| <M for every .i,j) and .9, M~,being. a finite 


positive constant independent of 9. Assume that n> 4/e. Then 


en 


CHI-SQUARE GOODNESS-OF-FIT TESTS FOR DEPENDENT OBSERVATIONS 41 


wo 
| } A(v), and R= ) R(v) converge absolutely and, as 
ve-00 a v=-00 = 
jaa eo 
Sl 
a2 2 2 
POO) SL Teel (12) 
Pere 


where Os (1 < j < r-1) and O are the eigenvalues of 
Ax = A - PA — A'P' + PRP' and “ (1 < j < r-1) are mutually 


independent r.v."'s each with a chi-square distribution with one 
degree of freedon. 


Note that if {x} is a sequence of i.i.d. r.v.'s then 
A* = A*(0) = I - pp" - PA(O) - A'(0)P' + PR(O)P'. (13) 


A reformulation of Theorem 2.3 in terms of linear processes 
is too complicated and hence withheld. 


3. EFFECTS OF ESTIMATION OF PARAMETERS ON A* 


We now investigate different methods of estimating 9 and 
the effect of such estimation on the matrix A*. 


A. Suppose 6 is the 'restricted' maximum likelihood estimator 
of 9 derived as a suitable solution of 


n 
Y @2n g(X,30)/00, = 0, l<j<s. (14) 
t=1 Sie RS 


Then under a set of regularity conditions (see Cramer, 1946, pp. 
500-501) on g it can be established that strict stationarity 
and strong mixing property of {x, } will imply that 


n 
pee is 1a) ora), (15) 


— ee 
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where J = ee 


7 
j = -E(d &n g(X, 36)/90 98), DEES US Selah 


Loe [L,----L5], 


~ 


Lj) = dfn g(x;0)/90 . 


-1 
Thus for this particular case we can write h(x) = J 


L(x). 


B. Suppose 6 is the 'restricted' modified minimum chi-square 
estimator of @ defined as an appropriate solution of 


A 
)} n,d’n p,/d0, = 0 pyri et (16) 
‘=aheen Paes, 


Then the following result holds. 


Theorem 6.ti.. Let {x, } satisfy the condition of Theorem 2.1 
or 2.1'. Let § belong to an admissible set © and let its 


true value 9° be an interior point of ©. We assume that (i) 
for every 6, 0*€0, 9 # 6%, p, (8) # p, (9*) for at least one 


TN ana oh He ete ad Brae oi) op, (0) /20, (Lm piitsycl < ior Wey T exist 


and are continuous at 0 and (iii) Q = P'P is nonsingular at 
0 a ‘ en 

Q@= 9. Then there exists a consistent root @ of (16) such 

that 


Alb 2 0) = 1-705 
nl?@ - 9° ~ guint) #0 a7) 


“ i 
where Q% = Qe"), Me = ie 0 M. = ) 


int’ “en! aa RS wie: p,/285) 


and ve = M, (ay 
jn 4tiw 


It follows, immediately, that for this case HL = 
(P'P) "PTY, and therefore 
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= (I~ SAG - 8), (18) 


where I is an identity matrix or order r and S = P(P'P) pr, 


és ‘Let g (x; 8) = £((x - 8,)/8,) and assume that f is 


symmetric about zero. Estimate 8, by the sample median 6 
in c 
and 6, by 85 = interquartile range/2c where f{ £(x)dx = 3/4. 


—0O 


Then it can be proved that for linear processes defined earlier 
(see Chanda, 1976; and Dutta and Sen, 1971) 


n 
n/?@, - 6,) = (at/?£(0)) 7 2 910%) + 0,0), (19) 


Aa 1/ 
(8, - 8.) 


n 
(2n ae CNTs 2b + p75 Ds 


where $, (x) 1/2 - a(x - 61), (x) = a(x - ED - a(x - E4) 
+ 1/2, a(x) = 1 if x<0O and a(x) 0 Site ox; 0) Sand aT = 


o) = c8., Eo = Oo, + cO,. Therefore, in this case 


La -1 -1 
HY = [(£(0))"7,(%), (2e£(c)) "6, &,) 1. (20) 


It is not difficult to prove that the result of Theorem 2.3 
applies here. 


4. A FEW EXAMPLES 
In this section we derive expressions for A and Ax for 
a few special cases. 


I. Assume that {x} is Gaussian with E(X,) = i, V(X.) Se 


) = e(v) = OG ne) for some € > 0, and 


Cov(X, Xi 


co 
1+ 2 } Gos vy A p(v) > 0 “for all. 4, (-17 =A <1). ‘The last 
v=1 
two conditions guarantee strong mixing property of {x, } with 
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ee) 


mixing coefficient a(v) = O(v (see Theorem 10.1 and 10.2 


in Rozanov, 1967). 


n 
ae X “1 i = ina 
(a) Let wp=Xe=n ) X- This leads to HL Xx. u and 


t=1 
A* = \ - yPP! (21) 
Snedee yol™ Poona eer kaetDiy, aes we, SO 
i a ak aaa 4977 "ts ay iP; 
z c, c, /ul, A,,(0) = -(p.p sil Lf< Pod -¥5>A ¢) (0) oreo 
ie S- ft Soest FAY a: + 


Sorte ees (u-1) * 1/2 
d=) SO Gae ci? (a,) - > (a, _1)> ox) = (27) 


2 
exp(-x /2), a, = a, — Us Py = (a, ) - (a, _,) = Cio o(x) = 


1/2 


x 
ip o(y)dy, P' = [k,---k J, k, If p(v) = 0. for all 


SB ha 
v#0 then y=1 and A= [X, 00) 1- This result was first 


derived by Chernoff and Lehmann (1954). If {x} is Gaussian- 
Markov with p0(v) = oe |p| <1(v20) then y = (1+ 0)/(1 - p) 
and d= p'/(1- 0). 


(b) Let 6 = median of the sample X 


prrssX» Then H, = 
Li2 
(27) (1/2 - a(x, - H)) (see (19)) and 
Ae = A BA ~ APS + PREY, (22) 
A i ‘= = 
where A, P are as in (21), A [a,,---4,,] where aig 


=] foe 
a,,(0) - p, / A boat Cy ama! C2meL)!, BL = (-1)™(2m)! /mt2™, 


and 


ll ei TO 
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1/2 
(mp, /2) » ifa, , 20, 
ECO Ip) 1e (leer pe BO (ie) LE cee eee, 
li i i i-l’*? i-1 le 
1/2 
(mp ,/2) , if a, <0. 


And, finally. 


R=7/2+2 }) are sin p(v). 
v=1 


tie’ Let {x,} be Gaussian with E(X,) =U) V(X) Se and 


nw — 
covariance function same as in I above. Let wu =X and 


n 
pee ince } (X% - %) 2721/2, Then oH'.=° [X. — ups { (X_(s A ke = 
tal t ~t t t 
o-}720] and 
Ae = A - PDF’ (23) 


where A is the same as in (21), P [p,,1> Pi 7 ~(o(a,) - 


(0, _1))/0, Pyy = -(a,o(a,) - a, 1 (0, _)))/0, a, = (a, - W/o, 


@e< i-<5r)4-Do= 0? /2( oY el. Y is as defined in (21) and 
v= 3 Ag GE If wu is known and we need to estimate only 
ver 
oO then 
A*x = A - yPP! (24) 


a 2 
where y = 0 6/2 and P’ = [Payee Py ]> with Pi 7 -[o, o(a,) - 


a, (a, 1/9, a, = (a, - u) /o. 
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ITE Let {x,} be a Cauchy-Markov process defined by 


Xe OOe a es (25) 


-0 << p<, |p| <1, where {w,} is a sequence of i.i.d. r.v.'s 
with d.f.. F, and F(w) = 1/2 + arc tan w/t, (-~ <w<™). Note 
that {x} is a linear process satisfying the conditions of 
Theorem 2.1'. It is not difficult to establish that for v > 0, 
Vv ; Tig 
cs <= = 
lP,, Ps, | Mvp where Pp |p| and M is a finite 


positive constant not depending on v or pp. It is also easy to 
see that for v > 0 


hes 
=ii2 fu 


gaty 


A ™ = (p;P,) Ee Rs - Feueees ade 


2 


where w,(x3p) = [(1 - 99)" + (a, - o'xS(a,_) - px)I/(a, - a, 4), 


f(x) = F'(x), An, ey = Pls and a, = (a, - u)/o. Also for 


0 
VeEswOe ander) Ste oe ai.) = ~mp, 1? ; £ (x) [F (w, (x30) - 
=—CO 
oe 
=¥/2 03 Vv 
F(w, (x3 9)) Idx, a,4¢-v) = Py f F(x) arestan({o x/Y_)dx, 
Pre1 
emery s eh 
te * ) Po 
wi? : 
™; /2 wit a, 2 og 
a. (0) 2 4 pee (apn le ee ane 
Li Bs ay a,_)1, if at eS << (0) < se i 
Wee 
"py! f2 sult Oh gout, 
1 


foe) 


Also for all) v > 0, Rtv) =F f f(x) are tan(o'x/y )dx = R(-v), 
0 oy 
R(O) = 17/4, 
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Values of > (1 < j <r) for Gaussian-Markov 


TABLE 1. 


w aleae 


process for selected values of p and r, 


(Py = Py = eee 


= 1) 


2. 0,19 
Hs ye 4) 

8 0 

(Pah) 

pik 

ho a 

9 0.06 
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5. NUMERICAL RESULTS 


This section contains some numerical details which will 
illustrate the effect of the structural dependence of {x, } on 


the validity of aes and yoereare defined in (3) and (4) 

respectively. Since there are infinite variations in the choice 

of r, p, (l<is<r) and G, it is difficult to decide which 
at 


few individual cases we should consider numerically. A simple 
choice would be the case where G is N(6,1), Py = Py =.= PY 
=r and lo(v)| < Mon (v > 0) where M is a finite positive 

constant and 0 < —) <1. Any {x,} which is stationary Gaussian 


and is autoregressive of a finite order will be a linear process 
satisfying the property of p(v), and will be a suitable example 
to consider. Again since the asymptotic validity of the x2- 


and eee in such cases depend only on dq. (see (21)) we 
may adequately represent the general Gaussian structure by the 
simple Gaussian-Markov process with o(v) = oO” (v > 0) whereby 

dio o'/(1 - p"). For case I we assume that {x} is Seles 
eee 


Markov with E(X,) = 0, V(X) = Pion Po pom. oP 


the corresponding Case II we assume that {X, - u} is Guassian- 


For 


Markov with the property as in Case I and we estimate U by x: 
The numerical values of Ty F fics 7 sr) ae =30— Te) for 


selected values of 0 and r are given in Table 1. 


It appears that us are most stable and contribute more to 
the validity of the chi-square test than = (1-<_4--¢-#)5--pover 


a wide range of values of 0. 
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AN ASYMPTOTICALLY DISTRIBUTION-FREE 
GOODNESS-OF-FIT TEST FOR FAMILIES OF STATISTICAL 
DISTRIBUTIONS DEPENDING ON TWO PARAMETERS 


FORTUNATO PESARIN * 


Institute of Statistics 
University of Padova 
35100 Padova, Italy 


SUMMARY. This paper concerns a general goodness-of-fit test for 
families of statisticai distributions depending on two nuisance 
parameters and satisfying some general conditions. The null 
distribution of the test statistic does not depend on the nui- 
sance parameters and is asymptotically distribution-free. The 
paper gives a short table of its critical values and a table of 
its power against some alternatives. 


KEY WORDS. Asymptotically distribution-free test, goodness-of- 
fit test, nuisance parameters, families of statistical distri- 
butions, weighted Cramér-von Mises test. 


1. INTRODUCTION 


Consider a family Vp of continuous random variables with cumu- 
lative distribution functions F(x;6,,9,); where the functional 
form of F is fixed and (8, 58.) is a pair of real nuisance 
parameters. Often, but not always, 05 and 8, are location 
and scale parameters, respectively. 


We will present a general goodness-of-fit test for the null 
hypothesis Hy that G(x) belongs to Ve> G(x) being the 


* 
This work is a part of research supported by the Italian National 
Research Council (Grant n. CT 77.0153.10). 
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c.d.f. of the observable r.v. X. The null distribution of the 
test statistic (a) is (8, ,9,)-independent for every sample size 


n> 2 and (b) is asymptotically V,~independent, i.e. the 
test is asymptotically distribution-free. 


2. CONDITIONS AND TEST 
The validity of the test assumes the following conditions: 
i) there are two monotone continuous real functions q and 
g, not depending on (8,58), and two real functions a and 
b, depending on (9, 59.)> such that q(FGx3 6, ,8,)) 


= a+ b-g(x) for every member of Vy3 


ii) The relation between (a,b) and (8, 8.) is one to one 
and the functions 0, (a,b) and 0. (a,b) are continuous; 
iii) the derivative q'(u) = dq(u)/du is almost surely uniformly 
continuous and different from 0, with respect to every 
member of V,, in the interval [0,1]. 
F 
Let Qe denote the set of all r.v.'s having the variances 
of the transforms q(F (X30, 8,)) and g(X) finite and positive, 
fOtwal le (6) 58,)- In this context the functions q and g 
characterize the family VE and linearize the c.d.f.'s of its 
members. 
Families of distributions satisfying the properties i) to 
iii) include, for instance: Normal, Lognormal, Logistic, 
Pareto, Weibull, Exponential, Gumbel, etc. 
In the case where (9, 58.) are location and scale para- 
meters, so that F(x;8) 59.) = Fy ((X-8,)/8,)> we have 


4 4 2 > <1), ban abd 
q Fo > 2(x) =x, a= 0,8, > b= 8. 
Let X) < x, Keeerg x be the order statistics corresponding 


to a random sample drawn from G(x) and consider the quantities: 


sls 


i, = aG/(mt1)), q; = 4, G/(mt1)), 8, = 8X), 


W 


, = (ntl) (nt2)/[4(mt-i) (q))7], d= Atty. 


a 
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The test we propose is een on the statistic 
ie eS =n Za fy. 0 = CH 

rodsuu set a-b-g,) w,t (1-r*) Dy where 
> 


2 2 
D = = = 
P tqW5 (2q,w,) /Xw, » oe rg 


2 
C => =e . = ° 
rae q,85%; (2q,w, gw, )/Lw,, r oe ee) Di) are 


respectively the variance of dy» the variance of Bi» the 


2 
iW, ~ @8,w,)/dw,, 
2 


covariance between qs and 85 and the squared linear correla- 
tion coefficient, all weighted with Ws: The statistic D* 
corresponds to the weighted residual variance from the least 
squares linear regression. 

The test statistic we propose is given by 


D = D*¥(n+1)/(n-2). 


Obviously large values of D are significant. 


3. THE TEST'S PROPERTIES 


Assume Hy is true. We first establish the properties men- 


tioned at the end of Section l. 


To prove property (a) we simply note that 1? is unchanged 
under nondegenerate linear transforms so that it is (a,b)- 
independent. Then it is also (8, ,8,)-independent by conditions 


a) and it); and+so D is; 


To prove property (b) we consider the r.v. q (F(X; 38, ,9,)) 
= atb+g(Xi) = q(U;), i= 1,-++,n, where U; are, by the con- 


tinuity of F, the order statistics from the uniform distribu- 
tion Ut0;1).* By the uniform ‘continuity ‘of q', if nis 
sufficiently large, we have that q(U;) Bavigc £9 (U;-i/(nt1))*q5 5 


i= 1,°*+,n. So q(U!) is, asymptotically, a linear transform 
z 
of U! and their joint density function is proportional to 
i 


that of Uj. Thus the test's components 
2 F ; 
(q,-a-b-g,)°w, = (U}-d/(a#1))* (#1) (nt2)/ [i oH1-4) ] 
are jointly asymptotically not dependent in distribution on q 


and then not dependent on F. This assures that D is asymptoti- 
cally V,~ independent in distribution, 
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TABLE 1: Critical values for some selected familtes calculated 


by a Monte Carlo procedure with 2000 replications. 


ieee Tn Sete SS LS Se Se SS 


Vy n: 10 20 40 100 200 a 
ee ot ee a a ee eee 
Exponential -840 952 918 914 946 +05. 

1.199, ~ 1293 weil 342.9. Lato. 46335 . 0k 

Logistic .705 . 800 762 st Oz - 688 -05 
IVOSS* WAY 2P 107 Pee ~952 OL 

Normal - 688 yoo. . Bie a7 Q5 -684 05 
2999" 1100 Ss 107% 974 949 sOL 

Weibull -687 .785 754 . 706 .687 05 
1.002 1.108 1.075 1.002 -953 -O1 


TABLE 2: Explicit expresstons for the quantities dso Wyo 8 for 


the families: a) Exponential, b) Logistic, ce) Normal, d) Wetbull, 
e) Pareto, f) Gompertz. 9% ts the standard normal c.d.f. 


Vi, a i) 85 
a) 1n[ (nt1)/(nt+1-i) ] (n+2) (nt1-i)/{i(n+1)] xX; 
b) fatty (atte) G2) (atte Garb ms 
: ee (n+1) (n+2) ' 

c) Zs . (z,) = i/ (nt1) “On i(ntl) exp (-z,) x 
nt+1 (n+2) (nt+1-:!) n+1-i, 2 : 

2 eer toad Eh habe. Saxaauules eae 
e) 1n[ (n+1) / (nt+1-i) ] (n+2) (n+1=-2 )/ [i(nt1) ] In Xi 
: : i (n+2) a: ae: 4 

if) 1n[1n((n+1)/i) ] (ati-i) (nti) (1n ae Xx, 


TABLE 3: Power of D with a = .05 for the families: a) Exponen- 
tia, b) Logistic, ce) Normal, d) Wetbull; against some indicated 
alternatives. Values calculated by a Monte Carlo procedure with 


500 replications. Dash means test not applicable. 


Alte ; n = 20 n = 40 

Berns ive Vy a) b) c) d) a) b) c) d) 
SE ee 00) Sen unnNee t Seb Ali ereeamiwm eric 
Exponential -050 nO ZO OU S045 5125 pe OF2 =s050 
Logistic 802 -042 .144 = 990 Ode woo - 
Normal rt be -028 .046 - ~990™ 7046" 2043 ~ 
Weibull -050 O90) 720) =. 050 3045" S25)" 839.722 050 
Lognormal 248 2 Ge tOO One G - 368 § "3396 © 7992 = 3358 
Chi-square (1) 460" “SreOl™ 7962" 0G o PTR S2EG OR O96. PY36 
Half-normal - 080 OZ 25s ee29 Damen 0) L4G" 054 sO? Omens & 


Half-logistic -026 -044 .496 .084 -082 .100 .838 .094 
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D is a consistent test against H, = (Oa Q-V_t. To 


prove this we observe the analogy between D and the weighted 
Cramér-von Mises goodness-of-fit test, which is asymptotically 
independent of the nuisance parameters and is based on the 
statistic 


AY =n SUR(K36,,0,) - i/n]?/(, (1-F,)], 


where (85 595) is a "good" estimate of (8, ,8,)- We know that 
the test based on a is consistent, provided that (9, 59,) 


admits a consistent estimator. We have that the continuity of 
q, the consistency of the least squares estimators in our 
regularity conditions, the continuity of 6, (a,b) and 6, (a,b), 


and the consistency and the weighted Cramér-von Mises test 
assure the consistency of D. 


The set of alternatives not contained in H, can be divided 


into two subsets: the one for which the test is not consistent 
and the other for which it is not applicable. This last situa- 
tion occurs for instance in cases where g(x) is not real or 

is not defined on some set of real numbers having positive prfoba- 
bility with respect to some members of Vee 


From Table 1 we observe that the critical values are prac- 
tically n-invariant and substantially the same, with respect 
to different families of statistical distributions, for even 
moderate values of n. 
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CONDITIONALITY PROPERTIES FOR THE BIVARIATE 
LOGARITHMIC DISTRIBUTION WITH AN APPLICATION TO 
GOODNESS OF FIT 


A. W. KEMP 


School of Mathematical Sciences 
University of Bradford 
Bradford, BD/ 1DP ENGLAND 


SUMMARY. The paper gives new modes of genesis for the multi- 
variate and homogeneous bivariate logarithmic distributions. 
Marginality and conditionality properties are derived (including, 
e.g.; yaks > Xs X|¥ - X =n), and are applied to the problem 
of grouping frequencies for a chi-square goodness-of-fit test 
for the homogeneous bivariate logarithmic distribution. The 
proposed procedure is objective and impartial with respect to 
rows and columns. 


KEY WORDS. Bivariate logarithmic series distribution, multivariate 
logarithmic series distribution, modes of genesis, chi-squared 
goodness-of-fit, objective grouping, homogeneous distributions. 


1. INTRODUCTION 


The multivariate logarithmic series distribution (LSD-m) was 
introduced into the literature by Khatri (1959). Subsequent 
papers, Patil and Bildikar (1967), Chatfield, Ehrenberg and Good- 
hardt (1966), see also Johnson and Kotz (1969), stressed its 
close relationship to the multivariate negative binomial distri- 
bution (NBD-m), just as the famous Fisher, Corbett and Williams 
(1943) paper emphasized the affinity between the univariate log- 
arithmic series distribution (LSD-l1) and the univariate negative 
binomial distribution (NBD-1). 


This paper deals with the bivariate logarithmic series 
distribution (LSD-2) and the bivariate negative binomial distri- 
bution (NBD-2) as bivariate forms of LSD-m and NBD-m. The no- 
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tation for these and other closely related distributions is given 
in Section 2, which also explains the concept of homogeneity for 
a multivariate discrete distribution. Section 3 is devoted to 
the genesis of LSD-m and LSD-2; we shall show that as well as 
NBD-m and NBD-2 based models there are many other models not 
related to the Fisherian-type limiting process. Marginal and 
conditional distributions for LSD-2 are given in Section 4; these 
results concern not only Paks = y and x [a +¥Y=n, but also 
XIX Soc RR ey pete IR a la 


Finally, Section 5 shows how these results can be applied 
to the problem of carrying out a chi-squared goodness-of-fit test 
for a bivariate distribution. A grouping procedure is proposed 
which is entirely objective and moreover is impartial with 
respect to rows and columns. A numerical illustration is given 
using data known to be bivariate logarithmic. 


2. NOTATION 


Gonsiders+x, “psp 2 and) deer R. Let x= (eae y eetanes) 


be an m-component random vector where all components have non- 
negative integer support. Let p = (a,b,-.-) be an m-component 


parameter vector with components c satisfying O<c<«<l1. Let 
z= (s, t,---) be an m-component vector of generating variables, 


and let d= (1,1,...). 


x has the multinomial distribution (MD-m) if its p.g.f. is 
of the form 


(1 - ped + pez)" 


where p*d <1 and n is a positive integer. The binomial 
distribution is MD-l. 


x has the singular multinomial distribution (SMD-m) if its 
p-g-f. is of the form (p*z)". where ped = 1, and n isa 
positive integer. ee | jet 


x has the multivariate neqative binomial distribution, also 
called the negative multinomial distribution (NBD-m), if its 
p-g-f. is of the form 

k k - 
(1 - ped) /(1 - pez) = (1 + qed - qez) : 


~ ~ 


where 0 <k, ped <7 Sand a= p/(1 - ped). The univariate 
negative binomial distribution is NBD-1, whilst the bivariate nega- 
tive binomial distribution is NBD-2. Note that this is the Bates 
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and Neyman (1952) form of the distribution, introduced originally 
by Guldberg (1934), see also Sibuya, Yoshimura and Shimizu (1964); 
it is not the more general version of Edwards and Gurland (1961). 


x has the multiple Potsson distribution (MP-m) if its p.g.f. 
is of the form 


exp{A(p*z - prd)}, Ore )., 


This is just the convolution of m independent Poisson distribu- 
tions. 


x has the multivariate logarithmic series distribution 
(LSD-m) if its p.g.f. is of the form 


log(1 - p*z)/log(1 - ped) where ped <1. 
The univariate and bivariate logarithmic distributions are LDS-1 
and LSD-2, where LSD-2 is the distribution studied by Patil and 
Bildikar (1967) and Chatfield et aZ. (1966). It is not the 
bivariate logarithmic distribution of Patil and Joshi (1968). 


x has the modtfied mlttvartate logarithmic distribution 
(MLSD-m) if its p.g.f. is of the form 


1 - 6 + @log(1 - p*z)/log(1 - ped), ped <1, 0<6< 1. 
Again the univariate and bivariate modified logarithmic distri- 
butions are MLSD-1 and MLSD-2. 


The multivariate version of the convolution of a geometric 
distribution with either a binomial with m= 1 or alternatively 
a pseudo-binomial type I (with exponent parameter equal to unity) 
will be called the multivariate btnomtal convolution distribution 
(BCD-m) and will be defined as having the p.g.f. 


{8 + (1 - 6)pez}/{1 + > - dpez} 


wnere. -p*d =1,.,0 < 0 <-1 +o, Ox<%@. In the univariate case 
(BCD-1); the distribution is (+B)*(“B) when 0<6<1, O0O< 9, 
and ("B) *(pB,) when 1 < 6 < 1+¢, see Kemp (1979). 


A multivariate discrete distribution will be said to be 
homogeneous if its p.g.f. contains the generating variables only 
in the form of a linear function within a function of a function. 
All the distributions defined above are therefore homogeneous. 
Note that NBD-2 is homogeneous whilst the Gurland-Edwards bivariate 
negative binomial is not. Also the usual bivariate Poisson with 


Piel Ge ate 
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exp{a(s-1) + b(t-1) + c(st-1)} 


is not homogeneous. 


Section 4 onwards will use G. niet = H(as + bt) as the 


> 
p-g-f. of an homogeneous bivariate distribution of X and Y. 


G (s) will denote the marginal distribution of X, and G (s) 
x 


x|y=n 
will denote the distribution of X conditional on Y taking 
the value n. 


3. GENESIS OF LSD-m AND LSD-2 


Many modes of genesis for LSD-m are extensions of those for 
LSD-1. Instances of LSD-2 and LSD-m will occur therefore in the 
many areas (ecology, linguistics, marketing, meteorology, etc.) 
where LSD-1 has been found appropriate. For lucid reviews of 
models for LSD-1 see Nelson and David (1967) and Boswell and 
Patil (1971). 


3.1 Ftsher's ltmtt model. Analogously to the univariate case, 
LSD-m is the limit as k-+O0 of the origin-truncated NBD-n, 
see Patil and Bildikar (1967). Hence, any model which leads to 
NBD-m can be expected to provide a model for LSD-m; for models 
for NBD-m see e.g. Bates and Neyman (1952), Sibuya, Yoshimura 
and Shimizu (1964) and Neyman (1965). 


3.2 Fisher's mixture model. The NBD-1 is the gamma-mixed Poisson 
distribution where the gamma distribution has parameter k. When 
the parameter k-+O there are two possibilities - either the 
Poisson can be truncated after mixing, or it can be truncated 
before mixing. The multivariate analogues are as follows: 


3.2a Truncation after mixing. Let the mixing distribution be the 
exponential integral distribution with probability density 
function (p.d.f.) 


eye (e/adw, ex<w<o, 


The mixed MP-m distribution has then the p.g.f. 


i exp{w(p*z - ped) fe! 


: Vy. obakyhe Obes ans ae = E,le/a - (pez - ped) J1/E, (e/a). 


Truncation of the origin then gives the multivariate distribution 
Wain Dp. Gt 
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y + log (e/a +eped - ep*z) + Sy - Y - log(e/a+eped) - S 
y + log(e/a) + S. - Y - log(e/a + eped) - § 


2. 
2 
log{1 - (1/a + ped)} + S, kA 


Bec 
~ log{1 - prd/(1/a + ped)} + 8, - 8, 


where E, (*) is the exponential integral, y is Euler's constant, 


5, = -e/a-ep*d+ep*z)"/n! = 
1 4, (-e/ eped Ep z) /n!n, Ss, Le iieimee and 
S, (Sj ]o-g: Finally letting ¢ +0 gives LSD-m. More informal 


univariate versions of this model have been given by Kendall (1948), 
McCloskey (1965) and Boswell and Patil (1971). 


5.2b Truncatton before mixing. Suppose now that the mixing 
distribution has p.d.f. 
ere ei his 
1F, G3 2; we /a,F, (1, 1p Zia) 
ter. Dyes are log (i=a)) , Os w <3, OF <a 


The mixed origin-truncated MP-m distribution will then have the 
Dine «te 


CO aNPL2 - 1 (e” a asus 
Uy Fu OE CBRE -eogCLE a) ater EUS Pelee /AeeLt oa 


The univariate version of this model was first suggested by 
Kendall (1948) and was given by Boswell and Patil (1971). 


8.38 Kendall's btrth-and-death models. Kendall showed that LSD-1 
is a limiting form as k~+ 0 of the immigration processes 
Witi.p.8-r,'s 


/B where A = (Br) t 


[(B - wW/(BA - p - Bas +’Bs)1* hia ai 


-k/B 


and (1 + 8T - BT) where 6 = U, 


as in the Fisher limit model. 


Kendall also examined the birth-and-death processes (no 
immigration) having BCD-1 with p.g.f. 


UA - u - LAs + Bs 


Bhai Ble: +. Bs where A = e(BrWT ee 
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Bt + (1 - Bt)s 


aa where §8 = U; 


and 


for other processes leading to this distribution, see e.g. Kemp 
(1979). Kendall showed that if t has a uniform distribution on 
(-T, 0) then the resultant distribution is MLSD-1 with p.g.f. 


1 , dogtl es x5) | 
we BT log(1 te x) [2 log(1 F x) > 


(BN ~ B)/(BA- Ww, Ase PWT ae og Zu, 


where x 


BT) Civ BP) ett SB = ite 


z 
ba 


More generally, if z has the BCD-m with p.g.f. 


~ 


(WA - UW - WApez + Bpez)/(BA - u - BApez + Bp>x) 
or (Bt + pon Btp*z)/(1 + Bt - Btp*z) 


where p*ed=1, A= an and t is uniform on (-T, 0), then 
the resultant distribution is MLSD-m. 


3.4 Kendall-Pattl binomial mixture. The MLSD-m has been presented 
in Section 2 as a binomial (n = 1) mixture of LSD-m. As Patil 
and Bildikar (1967) have pointed out, MLSD-m is also a mixed 

MD-m where the MD-m parameter n has an LSD-l. This is a multi- 
variate analogue of a result given in Kendall (1948). 


38.5 Quenoutlle's Poisson-log negbin relationshtp. Quenouille 
(1949) gave an important result relating the Poisson, logarithmic 
and negativé binomial distributions. He showed that NBD-1 arises 
as the sum of n independent LSD-1 variables where n has MP-1l. 
Recently Kemp (1978) has shown that the mathematics underlying 
this relationship are the same as those for the Fisher limiting 
derivation of LSD-1. The mathematical relationship between NBD-1 
and LSD-1 has here two quite distinct interpretations. 


The multivariate analogue of Quenouille's relationship is 


that NBD-m is the sum of n independent LSD-m variables where n 
has the Poisson distribution, see Philippou and Roussas (1974). 


5.6 MLSD-m as a limit distrtbutton. Consider now the NBD-m with 
Ds Gate 


k 
(1 - b) /(1 - b + ecped - ebte)S Led by Seepede 10. 
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Then the amount of probability at the origin is 
k 
Gaby fl. beep dy ttl = by 


Suppose now that the probability at the origin is reduced by an 


k , 
amount (1- b)', before k is allowed to tend to zero in the 
usual Fisherian manner. The resultant distribution is then MLSD- 
m, Since 


i= sb k 
= e 23 e J (1 = b) 
lim (: Bee" <3 :) 


k>0 k 


1 $5 (48508) 
ai tjg| Gb -ts8p-a = optz) Ft 
k>0 aes uy a 


log(1 - b + cp*d - cp*z)/log(1 - b). 
Consider also BCD-m with p.g.f. 
{6 + (1 - 6)pez}/{1 + o - pez}, 1% 0 <sl 6. 


Following the univariate treatment by Steutel (1968) and Kemp 
(1979), this can be shown to be infinitely divisible. Hence 

a k-fold convolution of such distributions will yield a valid 
distribution even when k is positive but not an integer. It 

is natural to ask whether LSD-m or MLSD-m can arise from it by a 
Fisherian-type limiting process. This does not occur; nevertheless 
the resultant distribution is closely related to MLSD-m. We have, 
remembering that ped = 1, 


(8 + (1 - 8)pez)/(1 + > - $pez) 
=1-A+t+A/1l+0- op*z) where A= (1 - 6 + $)/6. 
And so, taking “0-< C < 6/(1 + 9), 


sin|(1 - A+ A/C + 6 - opr) - cS 


nape ie 


log {+ 7 s 42/0 + > - p*z) }/1og(1/C) 


log{® - (8 - 1)p*z} - log(1 + > - $p*z) - logc 
iy - loge 
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If C= 6/(1 + 6), i.e. all the probability at the origin is 
truncated, then the resultant distribution has p.g.f. 
-log{l - op*z/(1 + $)} + log{1l - (6 - 1)p+z/0} 


- log{1l - ¢/(1 + oy} + log{l - (6 - 1)/6} ° 
Note that (@ - 1)/0 < o/(1 +6), since 6< 1+ 6. 


These limit results involving MLSD-m appear to be completely 
new. 


3.7 Mixed shifted-geometric model. Let us return to Kendall's 
mixed birth-and-death process given in Section 3.3. When 86 =U, 
the birth-and-death process gives rise to an unmodified logarith- 
mic distribution only when £T = log(1 + BT). This is a very 
artificial restriction. However, when wu = 0, the initial 
distribution is always shifted geometric and the resultant 
distribution is always LSD-1, see Kemp's (1981) algorithm BM 

for generating pseudo-random variables from LSD-1. The multi- 
variate analogue of this univariate model is 


ghee dv si OC eee 


0 1 -vp*z (v - 1)log(1 - a) log(1 -- aped) 
where ped = l. 
3.8 Mtxed BCD model. Suppose now that the mixing distribution 
with p.d.€. 
1/(1 + u)log(1 + L), Ores sD 
be applied to the BCD-m with p.g.f. 
(1 - > + pz) /(1 + up - udpez), ped ={1, 
The resultant distribution is MLSD-m since its p.g.f. is 
L 
log{1l - u/(1 + u) + gu(1 - p*z)/(1 + u)} 
- log{1 + L} 
= [log{1 + L} - log{1 + OL - $Lp*z}]/log{1 + L}. 


3.9 Mixed shtfted-Poisson model. The mixed shifted-geometric 
model in subsection 3.7 has important implications. Since the 
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geometric is a mixed Poisson distribution, LSD- 1 must be a mixed 
shifted-Poisson. We have 


ee ae (8-L) .-A/u (1/u)dA = s/(1 + u - us); 


because repeated mixing operations obey the associative law, the 
required mixing distribution therefore has the p.d.f. 


pu os makes ~ : eB, (A + AIL) 
‘(i+ u)log(1.+ L) ~ “log(l +1) 


where A has support [0,°) and E(*) is the exponential 
integral. Finally we obtain 


A 
je se (8-1) ss Ej OA + A/L) Zen logil - Ls/(1 + L)} 
0 log(1 + L) log{l - L/(1 + L)} ° 


Thus the shifted univariate logarithmic distribution is a mixed 
Poisson. The multivariate analogue of the model is 


eNO + X/L) 


me dC pee 1) 
fo (p Brees log(1 + L) gh 


= log{l - Lp*z/(1 + L)}/log{1 - L/(1 + L)} 


where ped = 1. 


3.10 The Potsson cluster model for LSD-1. Katti (1967) proved 
the infinite divisibility of LSD-1l when shifted so as to have 
support 0,1,2,.... The shifted LSD-1 therefore arises from a 
Poisson distribution of clusters of varying sizes. Explicit 
formulae for the cluster-size probabilities have been given by 
Kemp (1978). The multivariate analogue of this property of LSD-1l 
is 


exp[h(p*d) {h(p*z) /h(p*d) - 1}] = log(1 - ap*z)/[p*z £i(1 -a)] 


where h(s)/h(1) is the univariate cluster size distribution, 


h(u) = ) ioe (- u)* /eic, ptd = 1, and Bi) (. *) is the general- 
ral oe 

ized Bernoulli polynomial. The infinite divisibility property of 

shifted LSD-1 cannot be extended to LSD-m. 


3.11 Multtvartate damage models. Multinomial classification of 
univariate counted objects creates a multivariate distribution 
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from a univariate one; this is the multivariate generalization 

of the Rao damage model. Classification using SMD-m converts 
LSD-1 into LSD-m; classification using MD-m converts LSD-1 into 
MLSD-n. 


All the modes of genesis in this section have been given in 
their multivariate forms for the sake of generality. The focus 
of interest in this paper is however LSD-2; bivariate versions 
of all the above models can be obtained by taking m= 2. The 
importance of LSD-2 in many areas of application becomes clear’ 
when one considers the wide scope of the models given in this 
section. 


4. CONDITIONALITY PROPERTIES OF HOMOGENEOUS 
BIVARIATE DISTRIBUTIONS 


Consider now any homogeneous bivariate distribution with 
Dest. G. y (Sst) = H(as + bt). Then 
> 


G(s) = Cees = H(as + b) 
ayes = groans = H(as + bs) 
erly) = Gary 8: 1fs) =) HCas)-+ 'b/'s)*% 


Subrahmaniam (1966) has shown that for non-homogeneous as 
well as homogeneous bivariate distributions 


(s,t) oa. (s;t) 
G (a). = x  theyyens 
x|y=n n 2 

ot ot 


A simpler result can be obtained when (X, Y) has & homogeneous 
bivariate distribution, for then oa yh? t). = Htas > bt), and 


n 
ire =n ‘8) = 4 ) (as) /n™ (a). ake ie this is independent of 


Using a Subrahmaniam type of argument, the distribution of 
X conditional on X+Y =n can be shown to have the SMD-1 p.g.f. 


(s) = oO H(ast + bt) /}2°H(ast + bt) is ees 
x|x+ty=n 5en 9¢h 7 a. Dil ges 


whatever the form of H(°). 
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For the distribution of X conditional on X -Y=n we 
‘can show that 


(s) = Soeff. of tl! in H(as/t + bt) 


G 
x|yox=n coeff. of t” in H(a/t + bt) 
Now H(as/t + bt) will have a Laurent expansion in t. If it is 


possible to separate it into the sum of two functions H, and H, 


with power series expansions in t and 1/t, respectively, then 
the conditional p.g.f.'s can again be obtained by partial differ- 
entiation. 
Consider now the p.g.f. for LSD-2 
CG y63t) = log(1 - as - bt)/log(1 - a - b). 
<) 


The two marginal distributions are both MLSD-1, with p.g.f.'s 


G(s) log(1 - as - b)/log(1 - a - b) 


and oe log(1 - a - bt)/log(1 - a - b). 

The marginal distribution of X +Y is LSD-1l, with p.g.f. 
Cat? = log{l - (a + b)s}/log(1 - a - b). 

The conditional distribution of X|Y = 0 is LSD-1, with p.g.f. 


CA? = log(1 - as)/log(1 - a). 


Similarly the conditional distribution of Y|X QO is *LSD=1 with 


DNASE Re 


Bs luo? = log(1 - bt)/log(1 - b). 


ll 
is] 


However, the conditional distributions of Y|X =n and X|Y 
n #0, are NBD-1, not LSD-1; the p.g.f.'s are 


We: gt Va f= py 
P yen? e oa and Slag? Z (; - 7 j 


The conditional distribution of x|x + Y =n is necessarily 
SMD-1. We have 


n n 
C fbS td) ovigig and aif Fee? 
x|xty=n ant ob y |x+y=n at+b 
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All these results are special cases of the conditionality 
results given by Patil and Bildikar (1967), and can of course also 
be obtained as special cases of the results for homogeneous 
bivariate distributions. 


We come now to new results; these concern the marginal and 
conditional distributions relating to the difference between the 
values of X and Y. The marginal distribution of X- Y has 
thesp,.g. £% 


_ log(1 -'as - b/s) 
apie) ~ log(1 - a - b) 


_ log[(1 - ads)(1 - bA/s)/A] 
~ log(1 - a - b) 


_ “log@l ‘-‘ads) + log(t*= biX/s) = logQ) 
i log(1 - a - b) 


where A= {1 - (1 - 4ab) *}/2ab. Hence 
Prob(x - y = 0) = - log(A)/log(1 - a - b). 
Moreover (X - Y)|X > Y is LSD-1 with p.g.f. given by log(1- ads)/ 
log(1 - ak). Similarly (Y - X)[X < Y is LSD-1 with p.g.f. 
log (1 - bAs)/log(1 - bA). 


Consider now the distribution of xX|Y -X=n, n#0. We 
have 


(ae coeff. of t” in lo (1 - as/t -“bt) 


G 
x|y-x=n coeff. of t” in log(1 - a/t - bt) 


_ coeff. of t™ in log(1 - bnt) 


> 
coeff. of t™ in log(1 - bAt) 


1 
where n = {1 - (1 - 4abs) *}/2abs. . Hence 


1.40 140 
is 2 te sce AN Ale 2 ci iso) 


G 
x|y-x=n 2as 2a 


oF, {n/2, (n + 1)/23; n + 1; 4abs] 
7 oF, [n/2, (h + 1)/23 m+ 1s Yabh, ? 


which we can recognize as the univariate lost-games distribution 
from Kemp and Kemp (1968). Notice that this is symmetric in (ab); 
consideration of the coefficients of t™" in the requisite 
expressions establishes that 
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(s) = s"c (s)> 


G 
x|x-y=n x|y-x=n 


2 
Finally Corgty (s) = Logit coe ) 
log(1 + aig ) 


= Jogi + {1.- Gus habs) *}7/4abs] 
L 
16th, 4 ahem. Waby oy 14a) 


ul 
a 
° 
09 
aaa) 
hE 
-_—_ 
ke 
{1 
lon 
o | 
ise} 
Co 
o 
ZT 
we 
SS) 
see 
° 
09 
cm | 
Re 
aie 
oan 
Ss 
oO] 
o 
S 
» 
ion 
4 
[ ee | 


F,[3/2, et ee easly 
3F, 13/2, i a G0. 825 Lap) 2 


see Kemp and Kemp (1969), Kemp (1978). This is the univariate 
distribution which gives rise to the lost-games distribution when 
used as a Poisson cluster-size distribution. Note that it is a 
distribution with support 1,2,¢°°. 


5. CHI-SQUARE GOODNESS-OF-FIT APPLICATION 


The usefulness of these marginality and conditionality results 
becomes clear when we consider the chi-square goodness-of-fit test 
for examining whether or not data generated pseudo-randomly from 
an LSD-2 are fitted by the distribution. Four sets of data, each 
comprising 120 data points, were generated from the LSD-2 with 

0023 0.8) % 


The usual chi-square procedure is to evaluate the expected 
frequencies, grouping them’so as to avoid very low expected 
frequencies. The chi-square test is of course only approximate 
and there is disagreement concerning the lowest 'safe' size for 
an expected frequency; with a desire to be noncontroversial we 
have adopted a minimum expected-frequency size of 3. For a sample 
size of 120 this corresponds to minimum expected probability equal 
to 0.025. Note that because the data have been generated by a 
psuedo-random process, see Kemp (1980), the theoretical values 
of the parameters are known. More usually it would be necessary 
to estimate their values - Patil and Bildikar (1967) show how 
maximum likelihood estimates can be obtained. 


Problems arise as soon as we attempt to group the small 
expected frequencies. It is customary to consider these either 
row-by-row or column-by-column in the two-way array of expected 
frequencies. Sometimes this leads to odd-looking groupings. (Of 
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course the grouping procedure should be independent of the ob- 
served frequencies.) 


In order to avoid giving precedence to either rows or 
columns, let us consider the diagonals given by XG=—) Yo =". 
For L§D-2 with a= 0.1, b = 0.8, we find that A = 1.096. 


i] 


Since P(x - y = n) ~a ne Yn log(l1 - a-b), neo 


P(x - y = 0) = -log(A)/log(1 - a - b), 


a Pn log(1 -a-b), 1 <. Oe 


P(x - y =n) 


we calculate 


P(x - y 2 2) = 0.003 P(x - y = -6) = 0.033 
P(x - y=1) = 0.048 P(x - y = -7) = 0.024 
P(x - y = 0) = 0.040 P(x - y = -8) = 0.019 
Pixie = £1) = 0367 P(x - y =-9) = 0.015 
P(x - y = -2) = 0.167 P(x - y = -10) = 0.012 
P(x - y = -3) = 0.098 P(x - y = -11) = 0.009 
P(x - y = -4) = 0.064 P(x - y < -12) = 0.043. 


P(x - y = -5) = 0.045 


Note that a logarithmic series is fitted on each side of the origin; 
when the probabilities get smaller than the somewhat arbitrarily 
chosen figure of 0.025, the standard method of grouping for a 
univariate distribution can be used. These groupings are indica- 
ted by the brackets. 


Consider next the ungrouped probabilities greater than 
0.0252. This leads to an examination of the conditional distribu- 
tions of (X|X - Y = -1),» (X|X - Y = -2), (x|x = Y = -3), 

(x|X - y = -4). The tails of these distributions can again be 
grouped in the usual manner, making sure that probabilities great- 
er than 0.025 are attained everywhere. 


Table 1 gives the final groupings in column (1), together 
with their expected frequencies in column (2). Columns (3)-(6) 
show the observed frequencies for the four data sets and the 
corresponding chi-square values. 
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TABLE 1: Bivartate frequency distributions for four pseudo- 


randomly-generated data sets from LSD=2, a =.0.1,. b =.0.8. 


i ee 


Expected Observed frequencies 
Group frequency for four data sets 
x > y 6.05 Y 7 6 6 
x=y 4.78 4 7 5 6 
x=0, y=1 41.69 36 42 41 46 
x>0, y=x+1 4.01 6 5 5 6 
xe=s Ose y = e2 16.68 193, aS Paik 18 
x>0, y=x+t+2 336 4 5) 2 0 
y=x+3 slab 19 1 9 11 
y=ux+4 TO 10 7 6 5) 
yA 5 5.40 Zi Z 7 4 
v= x #6 31 54)5) 2 3 6 2 
y = x =37,18 bp 25 3 3 3 7 
Yo re) OFA O52 4.29 5 8 3 il 
y -'x?</12 5.12 4 5 6 8 
4459 10.95 Clad Soeks 12625 


The grouping procedure uses the results obtained in Section 4 
concerning the bilateral marginals and their associated conditional 
distributions for LSD-2. It has been adopted because it is an 
objective analogue of the usual univariate procedure. As in the 
univariate case the only real opening for controversy concerns 
the choice of minimum expected-frequency size. 
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A SURVEY OF ESTIMATING DISTRIBUTIONAL PARAMETERS 
AND SAMPLE SIZES FROM TRUNCATED SAMPLES 


SAUL BLUMENTHAL 
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University of Illinois at Urbana-Champaign 
Urbana, Illinois 61801 USA 


SUMMARY. Suppose Xjpo°°* Xy are independent random variables 
with common distribution F(x;8). We observe X, Oily dee Le 


lies outside a given region R. Thus the number n_ of observed 
X's. is a binomial. (N,P) variate, where P,=.1 - P(X in R). 
Based on n and the values of the observed X's, we want to 
estimate N and 96. Conditional and unconditional maximum 
likelihood estimators and modifications of these will be 
discussed. Asymptotic results of both first and second order 
will be considered, and small sample results also mentioned. 
Sequential and two stage results will receive brief coverage. 

A variety of applications will be given. 


KEY WORDS. Estimating n, sample size estimation, asymptotic 
expansions, second order efficiency, truncated samples. 


1. DISCUSSION OF RESULTS 


In many sampling situations, observations are restricted to 
a part of the possible range of population values. Of a set of 
N potential observations, only n remain observable as a result 
of the restricting process, while (N-n) are eliminated by it. 
The resulting set of incomplete data is commonly referred to as 
censored when N (or N-n) is known, and as~truncated otherwise. 
In life testing which is designed to estimate the average life 
time of, for example, transistors from a given production line, 
typically, a certain number of items are tested for a fixed 
length of time. This gives rise to a censored sample of lifetimes 
since the lifetimes of items surviving the life test are unknown. 
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In certain life testing situations the total number of items on 
test is not known. This happens, for instance, when among the 
items put on a life test there is a certain unknown number of 
items with a particular defect identifiable only after the item 
fails. If the lifetime of an item with this particular defect 
is the variable of interest, then, the sample is truncated in 
that the number of missing observations of lifetime greater 

than the burn-in or testing period is unknown. An important 
problem that arises in such situations is estimating the number 
of remaining defectives of a particular type after an initial © 
burn-in period (Blumenthal and Marcus, 1975a,b). Sequential 
estimation procedures in which the test period is not pre- 
determined and for which one can guarantee with a high probability 
that the observed number of defectives of the given type equals 
the total number in the batch are of special interest in 
reliability. They can be applied as screening or sequential 
burn-in procedures which, with a specified level of confidence, 
remove all defectives in the batch (Marcus and Blumenthal, 
1974). In the quality control context in which a defective 

can be identified without a life test, such screening procedures 
have been in use for many years. 


Censored and truncated samples also arise in production 
engineering when sorting or inspection procedures eliminate items 
above or below designated tolerance limits. The measurements 
made on the items within the tolerance limits represent either 
a censored or truncated sample depending on whether a count 
was kept of the number which failed to meet the specifications. 


Another interesting example arises in the software reliability 
literature. Jelinski and Moranda (1972) postulate a model for 
the initial error content N of a program. This model can 
be paraphrased by saying that with each error is associated a 
random variable representing the running time of the program 
up to the time of detection of the error. These errors are 
independent identically distributed random variables. After 
running the program for a fixed period of time, an estimate of 
the number of undetected errors is required. Technically, this 
is exactly the same problem as estimating the number of remaining 
defectives after a life test. 


Many sociological, epidemiological and genetic or medical 
investigations seek to study a target population of persons 
characterized by a trait that occurs only rarely in the popula- 
tion at large. Ascertainment of the study group is often done 
by merging preexisting but incomplete lists (such as birth 
records, death certificates, records from hospitals, schools 
and other institutions and disease registers) of members of the 
target population. However, there may be cases not listed by 
any source. Thus, the problem is to estimate the number of 
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such unlisted cases from a truncated sample (Wittes, 1970; 
Wittes et al., 1974). A similar situation arises in visual 
scanning experiments in particle physics where film containing 
photographs of particles is scanned separately by two or more 
scanners all of whom might miss some particles with very poor 
visibility (Sanathanan, 1972b). 


The truncated Poisson typically arises when the zero 
class is unobservable, owing to the fact that only cases where 
at least one event occurs are reported while the total number 
of cases is unknown. An example would be the number of accidents 
per worker in a factory. While it is a simple matter to count 
over a given period of time the number of workers sustaining 
one or more accidents, the number of persons who did not have 
an. accident could not be enumerated owing to the factory popula- 
tion fluctuating during that period. Another type of Poisson 
data with the zero class missing would be that of the number of 
persons per house who are infected with a disease such as 
measles. The zero class represents households of which one or 
more member has come in contact with an infected individual, 
but none of its members has contracted the disease. The number 
of such households is difficult to assess. 


An example of the truncated negative binomial distribution 
is the following, given by Sampford (1955). If chromosome 
breaks in irradiated tissue can occur only in those cells which 
are at a particular stage of the mitotic cycle at the time 
of irradiation, a cell can be demonstrated to have been at that 
stage only if breaks actually occur. Thus in the distribution 
of breaks per cell, cells not susceptible to breakage are 
indistinguishable from susceptible cells in which no breaks 
occur. 


Goel and Joglekar (1976) fit the truncated negative binomial 
distribution to field failure data for several types of systems. 
For each system type, N units were in use for a known period 
of time T, and a count n, was available showing how many of 


the units failed x times (x = 1,2,*°**) during the test 
period. No count was available for the number of units showing 
zero failures in the test period, i.e. N and Ny were unknown. 


A somewhat different example is given by Rao et al. (1973) 
who fit a negative binomial distribution to the counts of number 
of children per family in a given sample. In this sample there 
are M childless families which are believed to consist of two 
subgroups: one which is basically sterile and biologically a 
separate group, and one which is fertile but did not have 
children. The number Ny in the latter group is the required 


frequency, and except for being bounded above by M, it is 
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unknown. Thus, the zero-truncated negative binomial distribution 
must be used. 


Censoring or truncation of a random variable X implies 
that there is a region R where if X is in R, X is not 
observable. In the case of a continuous variable (e.g. survival 
or reaction time) this generally means the values of the missing 
observations are known to lie in a certain open ended interval. 
On the other hand, for a discrete variable with a missing category 
it means that the missing observations belong to some known 
category, although their number is unknown. Such data are 
therefore necessarily of the truncated type. 


There are three distinct sampling rules that are associated 
with truncated data. 


1. The total number of random variables 'N is fixed, but 
unknown. The n_ observations obtained represent the 
values among the N which fall inside the observable 
region R, (R is the complement of R). Im this case, 
n is a binomial random variable with parameters N 
and P = P(X in R). This is referred to as binomial 
sampling, and will be the only situation covered in 
the present survey in detail. 


2. N fixed and unknown as in 1. The n_ observations 
obtained represent the central order statistics with 


the smallest So and largest S) values being 


censored. If So and S, were known, this would be 


straightforward censored sampling. With So and S$) 


unknown, this could be referred to as pSeudo-censored 
sampling, and has been discussed by Blumenthal and 
Marcus (1975a,b) and Johnson (1962, 1967, 1970, 1971, 
£972, 91973 , 497 Gag be el I7G)s 


3. The underlying population from which sampling occurs 
is assumed to be infinite and values of the random 
variables in question are unobservable in a certain 
region R. Sampling is continued until a predetermined 
number n of observations is reached. The total number 
of random variables N _ generated in the process is 
now random with a Pascal distribution and is not known. 
This is referred to as the inverse binomial sampling 
rule. For further discussion of this situation, see 
Blumenthal and Sanathanan (1980). 
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Two types of problems are dealt with primarily in connection 
with the binomial sampling model for truncated data. One type 
calls for the estimation of the unknown number of missing obser- 
vations. The software reliability problem, and one of the life 
testing problems discussed earlier are of this type. Estimation 
of the other parameters is only incidental in such problems 
whereas for the other type of truncated sampling problems al- 
though the number of missing observations is unknown, the focus 
is on the estimation of distributional parameters. This is the 
case with many life testing situations. 


Estimation on the basis of truncated samples was dealt with 
in the early literature using the conditional approach almost 
exclusively. This approach consists of eliminating the unknown 
total sample size from consideration by assuming the number of 
observations to be fixed and then examining the conditional 
distribution of the given observations namely, the truncated 
distribution. The traditional approach to parameter estimation 
trom truncated samples has been through these distributions. In 
nany cases, the only estimation techniques available are of 
this conditional type. Thus, inference for truncated distribu- 
tions plays an important role in any examination of inference 
for truncated samples. In particular, results of Tukey (1949) 
and Smith (1957) showing that sufficiency and completeness are 
inherited by the truncated distribution from the parent are 
particularly important. These have been applied to derive uni- 
formly minimum variance unbiased estimators of reliability via 
the Rao-Blackwell technique by Holla (1967) for exponential 
distributions and Nath (1975) for gamma distributions. Carry 
over of monotonicity properties from the parent distribution can 
be helpful in developing moment estimators. Gross (1971) worked 
on monotonicity for truncated distributions. 


A great many papers have appeared dealing with estimation 
of parameters of specific truncated distributions such as normal, 
exponential, Poisson, and negative binomial among others. For 
a thorough survey of the literature the books by Johnson and’ 
Kotz (1969, 1970a,b, 1972) are recommended. 


Concern for estimating the number of missing observations 
appears much less often in the literature. When it has appeared, 
it has usually taken the form of a conditional estimator. If N 
is the total number of possible random variables and n is the 
actual number observed, we can estimate N by 


N = [n/P] (1) 
where [x] is the greatest integer in x and where P is an 
estimator of P(X in R). The conditional approach is to 
estimate P through the truncated distribution. This approach 
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is found for instance in Hartley (1958) for the missing zero class 
of the negative binomial, in Dahiya and Gross (1973) for the 
Poisson, and in Sanathanan (1972a) for the multinomial. 


Working with the exponential distribution, Blumenthal and 
Marcus (1975b) introduced the unconditional approach to estimating 
N and the distributional parameters. The unconditional likeli- 
hood is just the product of the conditional likelihood given n 
and the binomial likelihood for n. It thus takes the random 
nature of n into account and unconditional maximum likelihood 
estimates N can be appreciably different from their conditional 
counterparts. In this paper, it was observed that certain sample 
configurations cause the conditional N to be undefined and a 
subset of these samples still leave the unconditional f un- 
defined. To avoid this situation, Blumenthal and Marcus examined 
Bayes modal estimators with respect to a conjugate prior density 
for the parameter of the exponential distribution. They found 
that these "modified" estimators had a form very similar to 
that of the maximum likelihood estimators and would not become 
undefined for any samples. In the sense of the usual asymptotic 
distribution Lien these estimators could not be distinguished 
from the m.l.e.'s, i.e. we could express N as 


N= N+0VN Z + 0(1) (2) 


where Z is a function of the observations having mean zero 
and variance one whose distribution becomes normal as N 
increases, and the same Z and oO can be used to describe all 
of the estimators discussed so far. Thus, to make asymptotic 
distinctions among the estimators which might have some bearing 
on small sample properties, the oe term was examined in 


more detail, being split into a constant 8B anda O (1VYN) term. 


It was found that the 8 term does differ among the estimators 
with the conditional estimator having a systematically larger 8 
than its competitors, reflecting a tendency to overestimate even 
when finite valued. The expectation of 8 can be thought of as 
an "asymptotic bias." By manipulating the parameters of the 
conjugate weight function, they found a 8 which minimizes the 
maximum Ef over all parameter values. This particular "minimax 
bias" modified m.l.e. is a strong candidate to replace the m.l.e., 
and simulations by Watson and Blumenthal (1980) verify its 
superiority to the two competing m.l.e.'s. 


This same approach was applied to estimating the missing 
zero class for the Poisson by Blumenthal, Dahiya and Gross (1978). 
Modified u.m.1.e.'s were shown to be superior to their unmodified 
competitors both asymptotically and for small samples. 


— 
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For any single parameter distribution satisfying some mild 
regularity SRE NEARS Blumenthal (1977) derives expressions for 
the "asymptotic bias" of N to allow comparisons of the various 
likelihood type estimators. The results are specialized to scale 
parameter families and the "minimax bias' modified m.l.e.'s are 
characterized. 


For distributions depending on a multidimensional parameter 
only first order asymptotic properties have been studied. 


8 
Sanathanan has obtained the limiting joint normal distribution of 
N and 6 for the multinomial distribution (1972a) and for 

a more general class of regular distributions (1977). 


If instead of trying to control asymptotic bias, one looks 
at a correction term to E(N - me. then the O (1) term in 
(2) must be broken into 8 + (y//N) + aoe and the contribu- 


tion of y to the mean squared error examined. Watson and 
Blumenthal (1980) have done this and found modified estimators 
which approximately minimize maximum mean squared error. Monte 
Carlo comparisons suggest that these offer some improvement 
over the minimax bias estimators, but relatively little. 


Blumenthal (1980) has considered both modified u.m.l.e.' 
and modified c.m.1l.e.'s obtained by using a conjugate prior on 
the conditional likelihood. The conjugate priors for the 
conditional and unconditional cases are not the same. For 
estimating N or a parameter 9, using minimax bias as a 
criterion, sometimes the modified u.m.l.e. is best and 
sometimes the modified c.m.l.e. Generally, modified c.m.l.e.'s 
share the desirable property that ff will exist for all samples. 
For obtaining first and second order asymptotic properties, the 
modified m.l.e.'s can be regarded as members of a class of 
estimators of the form 6 + (9(6) /n) where appropriate choice 
of o(8) allows duplication of the asymptotic properties of , 
any given modified m.l.e. starting with any modified m.l.e. 0. 
However, small sample properties are not readily reproduced, 
since 9+, (0 (6) /n) will lead to nonexistant N_ through (1), 
whenever 6 does. It is fairly common to use 0(@) = -nE(6 - 0) 
so as to make the "adjusted" oe unbiased to a second order 
approximation. Adjustment of N_ can be achieyed either through 
adjusting 6 in een is by directly creating Neg 6(6), where 
for instance 6(0) = might be used. 


Extending results already available for complete samples, 
Blumenthal (1980) shows that the correction to mean squared 
error will in fact depend only on 8 under mild regularity 
conditions making derivation of the correction much simpler. 
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Relations between 8 and various measures of second order 
efficiency are explored, and Edgeworth expansions for the 
distribution of N are obtained and related to the expansion 
(2). Also conditional expansions of the form 


N = (n/P) + Avn + B + C/Vn +e°° (3) 


are examined. Using (3), expressions for E(N - (n/P) |n] and 


EL (N - (n/P))“|n] can be obtained and these depend only on the 
truncated distributions. It is shown that comparisons between 
estimators on the basis of these criteria (to first or second 
order) are unchanged by obtaining the unconditional bias or 
mean squared error. Hence unconditionally valid comparisons 
among estimators may be made strictly on a conditional basis as 
had been done historically. However, this does not imply that 
the conditional estimators are necessarily as good as the un- 
conditional ones with respect to the conditional comparisons. 


A somewhat different approach to controlling mean squared 
error for estimating N is taken by Watson and Blumenthal 
(1981). For the exponential distribution, it was noted that 
Oo in (2) is an unbounded increasing function of the ratio of the 
scaled parameter 9 to the duration T of the test. For fixed 
duration tests, O cannot be bounded. A two stage procedure 
is considered in which a fixed time first stage leads to an 
initial © which in turn is used to determine the duration of 
the second stage. It is shown that for large n, the total 
duration can be controlled so that for any 96, the resulting 
Oo is below a specified value So: 


In case the distribution is known completely so that only 
N is unknown, or if no inference regarding the distribution is 
desired, Blumenthal and Marcus (1975a) use Bayesian procedures 
which involve only a "marginal" prior for N. 


When N is estimated, a goodness of fit test may be desired 
for some given density f(x;@). Dahiya (1980b) has shown that 
the usual Pearson chi-squared test can be used but the result of 
estimating N is not to reduce the degrees of freedom by one. 
Instead, using N causes the distribution to be that of a 
weighted average of two independent chi-squared variates which 
are somewhat larger than the single chi-square with one less 
degree of freedom. 


Computing the simultaneous estimates of 6 and N for 
modified unconditional m.l.e.'s can create some difficulties 
as an examination of Blumenthal and Marcus (1975b) or Blumenthal, 
Dahiya and Gross (1978) will show. Simplification of the 
computation can be achieved by using a technique developed by 
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Dahiya (1980a) for finding the modal estimator of any integer 
valued parameter. 


2. APPLICATIONS 


Many of the references cited contain real data from which 
an estimate of N has been computed. In most of the literature 
prior to Blumenthal and Marcus (1975b), only the c.m.l.e. was 
found. We have mentioned simulation studies to verify that the 
asymptotic theory reasonably describes small sample properties. 
In one paper, namely that of Jelinski and Moranda (1972), follow 
up studies did lead to a verification of the original N estimate. 
The authors assumed an exponential distribution, derived the 
u.m.l.e. assuming N to be continuous, and computed N for 
three different data sets. The respective values of (n,N) were 
(26, 31.1), (14; 17.4) and (12, 14.45), based on the production 
checkout phase. A subsequent test phase led to detection of 
additional errors and updated (n,N) values of (31, 31.6), 

(15, 17.1) and (13, 14.54). Update data based on almost one 
more year of operation was reported by Moranda (1975) and 

showed n values of 32, 16, and 14 respectively. These can be 
taken as very close ap>roximations to the true N. Using the 
formuli in Blumenthal and Marcus (1975b), we find that for the 
production phase data the integer valued u.m.l.e. and the 
minimum asymptotic bias modified u.m.l.e. agree with one another 
and take the values (31, 17, 14) respectively. The c.m.1l.e.'s 
are computed to be (33, 22, 18) which are seen to be too large 
as asymptotic theory indicates. 
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SUMMARY. The multiparameter exponential families of probability 
density functions include many commonly used densities such as 
the normal, gamma, and beta, and many others besides. We consider 
the general parameter estimation problem for these families, 
given observations that are restricted (i.e. truncated) to the 
closed interval [a,b]. We show how to find maximum likelihood 
estimators using Newton-Raphson iteration, but observe that in 
general this technique requires numerical integration within 
each iteration. Then we give a new non-iterative method for 
obtaining the desired estimators for the parameter vector. An 
example using the generalized Inverse Gaussian is given. 


KEY WORDS. Exponential families, maximum likelihood, non- 
iterative estimation. 


1. INTRODUCTION 


Let X be a random variable which has probability density 
function f(x) (with respect to Lebesgue measure), where we 
assume f(x) to be that of a canonical exponential family of 
truncated densities, meaning that 
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m 


f(x) = exp E GY ae no] (1) 


i=1 
for xe[a,b] and zero otherwise. The functions ¥, are 
bounded on [a,b], measurable and linearly independent. The 
parameter vector tT is an element of ap and the normalizing 


function n(t) is determined by the condition that f f(x)dx = 1. 
a 

The class of densities encompassed by (1) include the exponential, 

normal, gamma, beta, inverse gamma, inverse Gaussian, and many 

more. 


It is well known (see Barndorff-Nielsen, 1978, or Lehmann, 
1959, for example) that canonical exponential densities of the 
form (1) have many special properties. For instance, it can be 
shown that 


an(t)/9t, = Bly,(K)] and 8°n(t)/9t,3t,.= Covly, (X,Y, )] 
Let A be the gradient, and y(x) = {y, (x) +++, 7, GI. Then 
An(t) = Ely(X)] and H(t) = A*n(t) = Cov[y(X),¥(X)]. 
Thus the Hessian matrix H(t) of -n(t) is the covariance matrix 


of the random variables ¥,@s for i= 1,*++,m. Let Yo be 


the function that is identically 1 on [a,b]. We will assume 
throughout that the functions {y, (x): i= 0,°--,m} are linearly 


independent. Under this assumption H(t) is positive definite 


for each TE it 


“ 


Now consider the problem of maximum likelihood estimation 
FOr sabe Suet Xp kL be a random sample from the probability 


distribution having density f(x) given by (1). The log- 
likelihood function L(t) is given by 
m 
LGT ier = -nn(t) +n ies T 585° 


where the sample means g, are defined by 


nh 
- A ytategaite i= 1,2,+++,m. 
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Since H(t) is positive definite, it follows (Zangwill, 1969) 
that n(t) is strictly convex over Rew Therefore L(t) is 


strictly convex over Be and so the maximum likelihood estimator 
(when it exists) must be the unique solution of the equation 
AL(t) = 0. Now let g be the vector (g)5°°'58)> and note 


that 
dtc) = whe = Mince, ACE (Ty me nh net) ace oe 


From this it may be seen that the maximum likelihood estimator 
(MLE) of tT must satisfy the equation g = An(t). 


In Crain (1976) it is shown that one can determine the mini- 
mal sample size necessary to ensure the existence (with probability 
1) of the maximum likelihood estimator, given certain regularity 
conditions on the functions {y, I. Wilks (1962) gives regular- 


ity conditions under which the maximum likelihood estimator is 
efficient and asymptotically normal. These regularity conditions 
are met by the parametric family of densities given in (1). Also, 
Lehmann (1959) indicates that g is a complete sufficient statis- 
tic for T. Thus maximum likelihood estimation would seem to be 
the best way to estimate T, and if the MLE of T can be com- 
puted reasonably then one should endeavor to do so. Unfortunately, 
there is no general closed form expression for n(T) other than 


b m 
n(t) = log{f exp [) TY, Ox) Idx}. 
a i=1 


This means that to compute the MLE of T even approximately, one 
must numerically maximize the concave function L(t) using some 
iterative scheme. For example, the Newton-Raphson method gener- 
ates successive estimates tyotys* for T by means of the 


recursion formula 
-l 
tar 7 t,t (e-dn(e I tH(t 1. 


Notice that within each iteration the moment vector An(t_) and 
the covariance matrix H(t.) must be computed based on the 
current parameter estimate th of tT. However, since closed 


form expressions for these moments do not in general exist, they 
must be prohibitively time-consuming to calculate. In this paper 
we present a direct, non-tterative method of estimating the para- 
meter vector T, given one additional restriction on the functions 


{y, (x) I. 
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2. ESTIMATION OF t 


In the previous section it was assumed that {y, («)} are 
measurable functions bounded on [a,b], such that Tole) cata 
are linearly independent, and Kah? is identically 1 on [a,b]. 
We now assume further that reese [a,b], that is, we assume that 
the second derivative yy &) exists and is continuous for 
eis liek envel ah (550 o.ga 


Taking the logarithmic derivative of the density (1), we 
obtain for xe [a,b]: 


d[log £(x)]/dx = £'(x)/£(x) = ) t,x} @)- 
af: 
Let BeR', and consider the function Q(f8) defined by 


b 
aia) = f fetG/£Ge) - T By! G)]e@ax. (2) 
pt 


a 


Clearly Q(f8) 2 0 for all 8, and Q(t) 30. Therefore Q() ” 
has a global minimum at 8 = 7T, which ini tres that AQ(t) = 0. 
For arbitrary 8, (2) yields AQ(8) = 2(BM- a), where 

b 


. bp YE GY] GE GD ax, 
b 


i] 
ae 


a. 
BL 


[£' (x) /£ (x) ly; Gx) £ (x) dx. (3) 


a 


Since AQ(T) 


0, we then have the exact relationship TM = a. 


relationship TM =a is the key to a direct, non- 
iterative procedure for estimating tT. If M is invertible and 
if the elements of M and a can be estimated, then estimators 


for tT are simply obtained from T = om, To establish that 
this can be done, the following lemmas are necessary. 


Lemma 1. Let X have the density (1). Then the mxm matrix M- 
defined by M. = Ely, Oy, 0] is invertible. 


m : if 
Proof. Let Se€R. Then, using 6 to refer to 6-transpose, 


b m 9 
f I Sy, Ys co f (x)dx = AMS << 
1 3 et 


a 
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Thus ems 270, for alt Senn and 6M6° = 0 if and only if 
m 


) S55 = 0. This can happen if and only if, after integrating 
i=1 

with respect to x, there exists a constant, say So> such that 
m 

, 67, &) = So: But this is a contradiction of the assumption 
i=1 

that Yo 6s) 079954, ) are linearly independent. Therefore 

emo = 0 if and only if 6 = 0, and so M is positive definite, 
and hence invertible. 


Lemma 2. Let X ‘have the density (1). If Y,& co fa.b then 
= ' = ' she " 
a, = £(b)y;(b) - f(a)yj(a) - Ely} @)]. 


Proof. Using integration by parts on (3), we obtain: 
b 


i] 


b 
a, = f [£'G)/£ (x) ly} @)EG)dx = f yp GE" dx 
a a 


b 
b 
' a " 
¥,@£O0 |, J Yi; (x) £ (x) dx, 
from which the lemma follows immediately. 


Lemma 38. Let X have density (1), and let cL. 
E[cos{km(X-a)/(b-a)}]. Then 


[1 + 2 ” c]/(b-a), and 


f(a) = 
k=1 

£(b) = [1+2) (-1)Kc,]/(b-a). 
kaa 


Proof. Let e(x) be the even function constructed from f(x) 
as follows: 


e (x) f(atx) for xe [0,b-a], 


e (x) 


f(a-x) for xe [a-b,0]. 
The Fourier series expansion of e(x) exists and has the form 


e(x) = ay/2 + L [u, cos{kx/(b-a)} + v, sin{kmx/ (b-a) }] 
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b-a 

cos{ktx/ (b-a) }e (x) dx/ (b-a) 

a-b 
b-a 

2° cos{ktx/ (b-a) } £ (atx) dx/ (b-a) 
0 


= 
Dp 
e) 
5 
io) 
(= 
i] 


b 
2 ‘| cos{km (x-a) / (b-a) }f£ (x) dx/ (b-a) 
a 


2c, / (b-a) forks = 0,152,°<*), vand 


v7, = f sin{ktx/ (b-a) }e (x) / (b-a) 


=0 for k= 1,2,3,°°°, 


since sine is an odd function and e(x) is an even function. 
Using the fact that Co = 1, the expansion of e(x) becomes 
foo) 
e(x) = [1 + 2} Cy cos{ktx/ (b-a) }]/ (b-a). 
k=1 


The lemma follows from f(a) = e(0) and f(b) = e(b-a), since 


Cont EN yee 1) 


These lemmas, together with the previously established rela- 


tionship TM= a, yield the following theorem. 


Theorem. Let X have density (1) with % § Cis for 


1.= 1; ° sm. Define 


Si et E[cos{km (X-a)/(b-a)}], 

di rable sti leet c,1/(b-a), 

d= (1+2 37, Ci)*e,1/(b-a), 

a, = diy, (b) - diy; (a) - Ely; @)1, and 

Mi, = Elyj@yj@], for i= 1,+++,m, and j = 1,77 


Then T = om71, 


Proof. Follows immediately from the three lemmas given above. 


In practice the constants d. and qd. are approximated by 


the finite series 
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Pp P . 
~ L k 
da” gh Delt eae Pll Sa Oe a ietbea), 


where O<p <n. Larger values of p (up to a maximum of n) 


will, of course, yield better estimates. The moments c. for 


k = 1,°**,p are estimated by 


n 
c. (1/n) } cos{km(X,-a)/(b-a)}. 
ik ; 


Similarly, the matrix M is estimated by 
n 
~ ! ' 
Ms, (1/n) ) 7 KY, 
k=1 
while the vector a is estimated by 


n 

~ ' a ! ¥ " 

a, = dys (b) a yi@)= Gye ye,): 
j=l 

The estimated value of T so obtained may be used as the first 

estimate in the Newton-Raphson procedure, if the MLE's are needed. 


3. AN EXAMPLE 


The generalized Inverse Gaussian probability density func- 
tion has been used to describe the waiting times between dis- 
charges of neurons (Yang and Chen, 1978). In its exponential form 
it is written 


£(x) = exp [T, 1n(x) + T,x + tans - n(t)], 
for xe (0,~). Neurons have a refractory interval after dis- 
charge, of length a (say), and a neuron is observed for a finite 
length of time, say b, after each discharge. Thus the task is 
to estimate T given observations of n independent neural 
discharges, where the density is defined on [a,b]. Let {x, } 


be the observations. In this case the matrix M is: 


E[X 2] E[X +] ?) 
eal 5 40 gn ge 2B(x] |, 
2 2E[X] 4E[X?] 


and the vector a is 
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-2 
a = [-d,/b + d/a + E[X “], d - d_> 2bd,. 2ad. NG 


Note that the estimator om + does not require computation 
of n(t), nor does it require estimation of any high order moments 
of xX. On the other hand, it does require the calculation of a 
few sinusoidal moments Che in order to obtain approximations for 


ds and d.- 
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PROPERTIES OF THE MAXIMUM LIKELIHOOD ESTIMATOR 
OF A MIXING DISTRIBUTION 


BRUCE G. LINDSAY 


Department of Statistics 
The Pennsylvania State University 
University Park, Pennsylvania 16802 USA 


SUMMARY. Given a random sample from a mixture density, the objec- 
tives is to estimate the mixing distribution by maximum likelihood. 
Certain properties of this estimator are identified. The estimator 
is characterized by a family of inequalities which correspond to 
the usual parametric likelihood equations. If the atomic densi- 
ties underlying the mixture are of exponential type, it is demon- 
strated that the estimator matches the first sample moment to the 
first theoretical moment evaluated at the estimator. However, 

the sample variance is related to the theoretical variance by an 
inequality. Following this structural analysis, several algorithms 
for the estimator are discussed. The paper concludes with an 
example and a discussion of the difficulties of an asymptotic 
distribution theory. 


KEY WORDS. Mixtures, mixing distributions, maximum likelihood 
estimator, exponential family. 


1. INTRODUCTION 


Let £(x3;>): o€2 be a family of density functions, with 
respect to some sigma finite measure, for random variable xX. It 
will be assumed that is a measurable space so that we may 
define PM(2) to be the family of probability measures on 2. 
Typically, © will be an interval of the real line with the usual 
measurable sets. In this setting the function 


£(x;Q) = ff£(x;%) dQ), Q in PM(2), 
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also defines a density for X. If the measure Q were known, it 
would be the prior; in this setting it is assumed unknown. The 
family of density functions so generated will be called the mixture 
family generated by density f. The original family of densities 
are a subfamily of the mixture family generated by using the atom 
measures 6(¢): 


f(x; 6(6)) = £(x;9). i 


Henceforth, it will be assumed that the atom measures are in 
PM(Q). The atomte densities (1) correspond to an-underlying 
simple process. Data generated by a non-atomic measure Q corr- 
esponds to the observed mixture of simple processes. 


The observed data (x, .°°*>%)) will be a random sample from 


the mixture density and the objective will be to estimate Q by 
finding a probability measure Q in PM(%) which maximizes the 
likelihood: 


las 


L(Q) = 


ae 


; £(x,5Q). 


If any of the £(x,39) are unbounded in 4, then the likelihood 


L(Q) is unbounded, and so no maximum exists. Henceforth, it will 
be assumed that for each i,f(x,39) is bounded. Notice that it 


is possible for 0 to be an atom 8($), in which case @o must 
be the maximum likelihood estimator of @ within the atomic family 
of densities f(x;$). 


It turns out to be sufficient for maximization purposes to 
restrict attention to discrete measures Q with a finite set of 
points of positive probability (Lindsay, 1980). For convenience 
a dual notation for discrete measures has been adopted herein. 
Each such measure which has m _ points of positive probability 
will be said to have support stze m. There will be an m-vector 
of wetghts m and an m-vector of support points $ such that 


Q= a a iE In this case 


m 
£ (x; = ax 3 dau 
(x3Q) by ™, (x ,) 


A natural issue which arises within the mixture family is the 
identifiability of the mixture Q. These issues have been explored 
in Teicher (1963). However, even if the mixtures Q are them- 
selves unidentifiable there generally will be parameters of the 
mixture system which will be identifiable and estimable by the 


agt 1 Ose 


> 
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method of maximum likelihood. For example, if f(x;o) is itself 
a discrete density for X then the cell probability functions, 


ft(x36)dQCo), 


are estimable in an asymptotically consistent fashion. A 
particular example of this simple structure is discussed in 
Section 4. 


The major new results of this paper are found in the following 
section, where the mathematical structure of the maximum likeli- 
hood estimator of the mixture is examined. Central to the section 
is a theorem which characterizes the estimator as satisfying a 
derivative inequality similar in nature to the likelihood equa- 
tions of the parametric models. Focus is put upon results holding 
for members of the exponential class of densities, resulting in 
first and second moment properties. 


The third section of the paper is a review of computational 
methods for the maximum likelihood estimator. The fourth section 
is an identification of issues and difficulties associated with 
establishing a limiting distribution theory to be used for in- 
ference in the mixture family. The discussion is motivated by a 
simple mixture model arising in genetics. 


2. STRUCTURAL PROPERTIES 


2.1 Prtor Results. The first substantial theoretical developments 
concerning the maximum likelihood estimator of a mixture were by 
Kiefer and Wolfowitz (1956). The problem, as they stated it, 
involved an additional parameter, 9, so that the density was 

of the form 


J£(x39,6)dQ(o). 


Although their results concerned simultaneous estimation of @ 
and Q by maximum likelihood, they apply as well to the case 
where 9 is absent. Their paper established quite general con- 
ditions for the consistency of the maximum likelihood estimator. 
Little more was said about its structure, however, and nothing 
was said about its computation. 


Although in the intervening years there was some development 
of maximum likelihood methods for mixture models of fixed support 
size m, the next substantial development for the general problem 
came twenty years later in the work of Simar (1976). Simar 
exclusively considered the compound Poisson family. For this 
family it was demonstrated that a maximizing measure Q exists 
and is unique. It is a discrete measure. Simar gives a bound on 
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support size which has since been improved by Lindsay (1980). Simar 
also demonstrated consistency: the maximum likelihood estimator 
re) converges weakly with the probability one to the true measure 


Q. 


Laird (1978) extended some of Simar's results to the general 
‘problem. In particular, a self-consistency property was identi- 
fied (to be discussed in Section 2.2) which leads to an extensive 
set of conditions under which the mixture maximum likelihood 
estimator was a discrete measure. It was conjectured that for 
"well-behaved (analytic) unimodal densities" there would be no 
more support points needed than observations in the sample. 
Lindsay (1980) has shown this to be true for virtually all 
densities. 


2.2 Properties of Local Maxima. In this section certain proper- 
ties of local maxima of the likelihood are derived, where the 
following is meant by local. Fix the size of the support set at 
m. Now maximize the likelihood over discrete measures of the 
form: 


m 
dQco) = 4 TOG 


A local maximum is defined to be a maximum which is local in the 
2m-1 dimensional parameter of space of the parameters (1,06). 


In the following material it is convenient notationally to 
treat the parameter $ as a random variable © which takes on 
the values according to the probability distribution Q. In this 
setting the mixture density is the marginal distribution of X in 
the random pair (X,6). The posterior expectation of a function 
HG), » tsvthen: 


3 ental (AdEACese aged) 
BONS) TsO care Gos dae) 


The following theorem identifies certain consequences of maximiza- 
tion within the fixed support size framework. 


Theorem. Let Q be an m-point local maximum. (a) (Self- 
conststency) If g(¢) is an arbitrary function on Q then 


n 
A 


* } Blg(6) |x, 50} = B(e(®) ; Q). Q2) 
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(b) Betas an’ interval of R. and if  U(¢gx) = 9(1n £)/do is 
everywhere well-defined, then for any function h() which is 
zero on the boundaries of { one has 


n 
a5 ) E{h() U(®3X) |X=x, 3) = 0. (3) 
f=] 


Proof. Equations (2) and (3) are the respective likelihood equa- 
tions for the two parametric families of mixing distributions, 
each defined for tT in a neighborhood of zero, given by 


dQ _(¢) exp{tg($) }dQ($) /Sexp{te() }d0(4). 


and dP _($) d0(> + t h($)). 


By definition the point T = 0 must maximize in both families. 
Suppose that the atomic density f is of the form 
£(x;o) = exp[ox - c($)] (4) 


with respect to some sigma finite measure. Then f is a member 

of the exponential class of distributions, with @ the natural 
parameter. Properties of this family have been treated extensively 
elsewhere; for example Barndorff-Nielson (1978). 


Corollary (first moment). Suppose f is of the exponential form 
(4). Any m-point maximim Q which has no support points on the 
boundary of satisfies 


E{xX;G} = E{c'(6); QO} = x. (5) 


Proof. Use h(¢) = 1 in (3) and g(o) = c'(>) in (2) obtain the 
results. 


2.38 Properttes of Global Maxima. The special properties of those 
mixtures in PM(2) which globally maximize the likelihood will 
now be presented. In this section we consider properties resulting 
solely from convexity. In the following section regularity 
conditions will be assumed. 


The following characterization theorem can be found, albeit 
in several pieces, in Simar (1976) for the special case of the 
Poisson model. For mixture Q in PM(%), define the following 
function: 
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n 
D(o;Q) = } {£(x, 3) /£ (x, 5Q) - 1}, 
i=1 


The function D is the Gateaux derivative of 1n L(Q) with 
respect to Q and so the inequality below (6) is the mixture 


n 
model version of the usual likelihood equation: ) UCO3x,) = 0. 
i=1 


nw 


Characterization Theorem. (a) The mixture Q maximizes the like- 
ithood)eL(Q)” +tisandkonilysit 


D(o;Q) < 0 forall $ in Q. (6) 


(b) The fitted likelihood values f(x, 3) do not depend on 


the choice of maximizing measure Q. 


(c) Let D*(>) = D(o:Q), where by part (b) Q may be any 
maximizing measure. Then all maximizing measures have 
support in the set 


S = {o: D*(>) = O}. 
Proof. Let P be an arbitrary element of PM(2). Let 
goats T)Q + TP, te [0,1]. 


The densities £(x;P_) create a parametric subfamily of the 


mixture family, with score function: 


n £(x,3P)-£(x, 30) 


re) 
—— In Le.) <= Mae Seay lee yes 
OT 16 iat £(x,3P_) 
and score derivative: 
42 n (£(x, 3P)-£ (x, 3Q))? 
ot i 
ery ln L(P_) =)= ’ ORS ey eo (7) 
oT i=] f (x, 5P_) 


“a 


Part (a). It is clear that if the mixture Q is a maximum like- 
lihood estimator, then Tt = 0 is a maximal point within all the 
parametric subfamilies generated by varying P. The concavity of 
Ine (P) in T, equation (7), ensures that the latter condition 


is also sufficient for Q being maximal. The concavity in Tt 
also implies that the maximum occurs at tT = 0 if and only if 
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OT tT It =0 2 3 


The last inequality will hold for all P if and only if (6) holds. 
Part (b). Assume that there exists P such that L(Q) = L(P), 
where Q is maximal. The score derivative (7) will be strictly 
negative unless £(x,5Q) = f(x, 5P) for all i. If it is negative, 


then strict concavity implies that there exists Tt in (0,1) 

with more likelihood than the endpoints, a contradiction. Part 
(ec). If o* is in the support of a maximal measure 0, then the 
parametric subfamily with P = 6(6*) is defined for T in an 
entire neighborhood of zero. Hence, (7) can be strengthened to 


equality for this P. 


Suppose the set {x,,°°*,x } consists of K distinct 
values a Seba Fag, Let f be the transpose of the row vector 


a6 70 ee se OY Since the fitted values f are unique, 


~ 


it dis) Clear sthareit..Si= {$1 .¢¢5o5} then the maximum likelihood 


estimator Q is unique if and only if there is a unique J-vector 
of weights m1 (elements nonnegative and summing to one) satisfying 


At = £f, where A is the KX J matrix with (k,j)th entry 
reg sek Thus it is always possible, once the maximization has 


been done, to determine if the estimator is unique. In particu- 
Wareetteos sk sand A ots of fullerank, then. QO; is sunique. 
Lindsay (1980) has shown that there is an a prtort guarantee of 
uniqueness for atomic densities of the exponential class. 


2.4 Properttes Under Regularity. In this section the characteri- 
zation results of the last subsection are combined with regularity 
assumptions to obtain several new properties. In this and the 
following subsection, it will be assumed that is an interval 
and the atomic densities f(x;$) are twice differentiable in ¢ 
for all x in the sample space. The extensions to vector-valued 
d are generally clear. Since one can write 


DCG) = 


w,£(x, 59) — mn, 


i415 


where the w. are functions of the data, the regularity of D* 
follows from that of f. the following result is a corollary to 
the characterization theorem. 


Corollary. If $* is in the interior of %, then 
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(9) 
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D*' (o*) = 0 and D*" (o*) < 0. 
The characterization theorem can also be used to bound the 
range of potential support points for Q. 


Proposttion. Suppose that for each i the function f(y, 39) 


has a unique mode: a i * 2. - hee ‘aap be the minimum and vy 


be the maximum of {o, °° 2d}. Then Q must have its support 

in the range [bby] - 

Proof. By the hypothesis D*(>) must be strictly increasing 

when @ < Ue and strictly decreasing when $9 > uy: Since support 


points are maximal points of D*, they cannot occur outside the 


given range. 
The generalization of this result to multivariate 9 is that 
if each of the functions f(y, 30) has a single mode point $;> 


then the support points for 
the points 107° by: 


Q must lie in the convex hull of 


2.5 The Second Moment Property. In Section 2.2 it was found 


that for the exponential 
likelihood estimator, or 
first sample moment with 


class of densities the mixture maximum 
indeed any local maxima, matched the 

za 
the theoretical first moment under Q. 


It is now asked if there 
After all, the empirical 


is a further agreement of moments. 
distribution function, which puts mass 


in at each sample point, is both the maximum likelihood estimator 
and the method of moments estimator of an unknown discrete distri- 
bution. The answer for the mixture problem is that there is a 
second moment inequality rather than an equality. Before turning 
to the exponential class a general theorem is proved. 


Theorem. Let 0 be a mixing distribution which maximizes the 
likelihood. If h ($¢) is any function which is zero on the 


boundary of , or if Q has no support points on the boundary 
Of inne then 


a a 2 2 ; A 
n J) E{h@(®)(U“(O;X) + U (05X))| X=x, 30} < 0. (10) 
i=1 


Proof.) at , is the jth element in support of Q and it has 


weight pe then by inequality (9) 
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ae Ene, 2h ee, n £"(x,3$,)h°(o,)m, 
RECURSO I ae seen, 


Summing these inequalities over j yields the inequality 
2 n 
E{h* (6) £" (X50) /£ (X30) |X=x, 5 Q} < 0. 


The proof is finished by noting that f£"(x;)/f(x;o) = UW (dis x) + 
U' (o3x). 


Corollary. (Variance inequality) Let Ss denote the sample 
variance. If Q has no support points on the boundary of and 
if the atomic densities are of exponential form (5), then 


e < Var (X;Q) . Cit) 


Proof. Use h(¢) = 1 in (10) and then simplify, using equations 
C2ypnG3)rand 45). 


3. COMPUTATION 


This section provides a brief introduction to the computational 
methods currendly recommended for the estimation by maximum likeli- 
hood of the mixture Q. It will be seen that the mixture is 
readily estimated, particularly for densities of the exponential 
class. 


3.1 EM Algorithm. This general algorithm was formally identified 
and characterized by Dempster, Laird, and Rubin (1976), and the 
application to the fixed support size mixture estimation problem 
was pointed out there. The algorithm is directly appropriate only 
for the fixed support size problem. Laird (1978) recommends it, 
however, based on the observation that since the maximum likeli- 
hood estimator is discrete, one need only be sure that one chooses 
the support size m sufficiently large. It is suggested that one 
either start with m small and work upwards until the likelihood 
stops increasing or one start with m large, say near n. The 
last approach will work because the algorithm converges along a 
ridge; if m is too large, then it will either drive elements 

of the m vector to zero or create duplication in the support set. 


The technique is iterative and simply described 


(5h) Fale defining mixture 


» 


» compute the following conditional probabilities: 


Step one. Given estimators (7 


~ 
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= ET -1 _nket) 
nt) = P{p = oxox, 3a 1)} = te ) £4305" )/£ (x, 30 ) 


forts i= Wl 298, ee aa ee ee 


Step two. The new estimators are 


n 
nt) e% b me /n, j om Lae Sy 
j i=l 
and the solutions oy? to 
tee (cs) (r) 
yp UG ees) =U, yey eee (12) 
fel oe q ‘: 


If the solutions have not converged, return to step one. 


In the exponential class of densities, the score equation (12) 
gives a weighted mean of the x-sample: 


n 
= (r) (r) += eee 
) = ) ue x, /n ats 4 Ai eee Ls »m. 


The last equation implies 


m 
Elet(G)iQ° l= ne ee ee ee 
peyii j 


so that the EM algorithm satisfies the moment equation (5) from 
the first iteration onwards. 


The EM .algorithm has several attractive features. There are 
no matrix inversions or complicated calculations to be performed. 
The likelihood will increase with every iteration, so minima are 
not stable solutions. Most importantly, as described above, it 
will converge on a ridge. Its disadvantages include a relatively 
slow rate of convergence and a potential for terminating at a purely 
local maximum. There is also the problem of determining a set 
of starting values (Laird, 1978, recommends an equally spaced grid 
of ¢-values) and the wastefulness of searching blindly for the 
correct support size. 


a 


ee 


MAXIMUM LIKELIHOOD ESTIMATION OF A MIXING DISTRIBUTION 105 


5.2 Stepwise Inclusion. Simar (1976) devised an iterative 
algorithm for the Poisson model which used the characterization 
theorem. It could be generalized as follows: 


Step one. Given hes Shey 


maximum $* to D(prost =)» , If p(o*;q°t y= 0, then stop. 


with support set sft tnd ra: 


Step two. Let Poesy {o*}. Maximize the likelihood 
over the weight vector for the support set $(*), yielding 


, Aig 
estimator oh a Return to step one. 


It should be noted that Simar also uses two subalgorithms to 
reduce the size of the support set if that size exceed certain 
theoretical bounds. Although this speeds convergence, the maxi- 
mization over the weight vector in step two will sometimes delete 


points from g(r) and so the support size is not indefinitely 
increasing. Lindsay (1980) has improved Simar's bounds and has 
given bounds for the entire exponential family. 


The Simar algorithm has an important advantage; if the 
estimators converge, then it must be to the full maximum, not a 
local one. Also, the estimator may be started at a small support 
size -- for example, the one-point maximum likelihood estimator 
g@ -- and the algorithm will rather quickly adapt if a large 
number of support points are needed. Although the algorithm 
monotonely increases the likelihood, Simar noted that the conver- 
gence of the algorithm was difficult to prove and did not do so. 
Other disadvantages include the rather complicated maximization of 
D(¢;Q) at each step and the possible inefficiency of adding 
support points when it is a shift in the support points that is 
needed: This algorithm, in contrast to the EM algorithm, does 
not satisfy the first moment equation in one iteration. 


3.3. A Synthests. The computation of the estimator is feasible, 
but it appears an optimal strategy has not yet been identified. 

In the interim, a synthesis of the Simar and the Laird approach 
which combines the computational simplicity of the EM algorithm 
with the certainty of full, not local, maximization, is as follows: 
Use the EM algorithm, with a moderate support size m, until 
convergence to Q'. Then check D($;Q') < 0. If it is not true, 
then add to the support of Q' all local maxima $* of D(;Q') 
which satisfy D(*;Q') > 0. The logic for this procedure is that 
for Q' = Q, the support points are all local maxima. Start 

with EM algorithm again with the new support set. It should 

be noted that the expenses of computing the estimator for a large 
sample size when a large support is needed has yet to be estab- 
lished for this or any other method. 
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4. LIMITING DISTRIBUTIONS 


4.1 Introductton. The method of maximum likelihood has a long 
history of providing useful and mathematically elegant solutions 
to problems in estimation. An important part of its usefulness, 
both in the parametric setting and in the nonparametric setting 
of the empirical distribution function, is the detailed body of 
limiting distribution theory to use for statistical inference. 
The maximum likelihood estimator of a mixture has been shown to 
be computationally feasible; it is also of mathematical and 
practical interest because of its blend of the parametric and 
the nonparametric. However, there is at present no limiting 
theory to be used for inference. In this section this subject is 
given a preliminary exploration, motivated by the following 
example. 


4.2 A Trtnomial Example. The simplest possible non-trivial 
mixture model is the trinomial. Let X have the following 
discrete density: 


£(03o) = po(o), £136) = P, (>), 


P, (9) =o th a= Po 6?) 3 P, (>). 


£(230) 


The family of mixture densities is now described by the two 
parameters 


Pg (Q) = J polo) dQ(>) and P,(Q) = Sp, (9) dQ(). 


The set of permissible cell probabilities under the family of 
atomic densities is defined by: G = {(p9() » P($)): bee Qh SThe 


parameter space for the trinomial model is: 
= : < e 


The parameter space for the mixture trinomial is simply the convex 
subset of A generated by the set G and all convex combinations 
of points in G. This set is denoted conv(G). Hence the 

mixture maximum likelihood estimation of the cell probabilities 
(Po Py is simply a constrained version of the full trinomial 


likelihood problem. 


A model of this form can be generated by a simple genetics 
problem. Suppose that a given locus of a chromosome in a diploid 
individual can have one of two alleles, A or a. Each individual 
in the population can then be identified as having one of three 
genotypes: AA, Aa, or aa. If the population frequency of A 
is $, and an individual is picked at random from that population, 
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then according to the Hardy-Weinberg law the probabilities of that 
individual being AA, Aa, or aa are 2, 26 (1-6), and 


2 ; 
(1-9) Eespectively. If we identify .X = 0, with (AA, XxX =.1 
with Aa, and X=2 with aa, then a model for a sample from the 
population is trinomial, with the above cell probabilities. If 
the sampling is done from a mixture of subpopulations with 
different gene frequencies $, a state of genetic disequilibrium 
for the whole population, then the true probabilities of each 
genotype are mixtures of the Hardy-Weinberg frequencies. The 


parameters Jf dQ(o) = PQ) and’ { 26(1-$) dQ(o) = P, (Q) 


are identifiable and estimable by maximum likelihood. From them 
one can estimate the mean and variance of the mixing distribution 
Q. For the genetics problem the mean of Q is the population 
frequency and the variance is a measure of disequilibrium. 


For the trinomial problem, the maximum likelihood solution 
proceeds as follows. Identify conv(G), the convex hull of G. 
If there are No zeroes in the sample and n, ones, then the 


maximum likelihood estimator of (P,>P4) in the full trinomial 
modal is (nj/n, n,/n). It follows that if that point is in 


eonvdG). -then 1: is also the mixture m.l.e. If at is not in 
conv(G) then the concavity of the trinomial log likelihood 
guarantees that a point on the boundary of conv(G) maximizes 

the likelihood. For the genetics problem the boundary consists 
of G itself, so in this case the mixture m.l.e. has a onexpoint 
mixing distribution, $9 = (2np + n,)/n. It follows that 


Po = makiac- n/n}, Py = min{2$(1-4) , n,/n}. (3)) 


4.3 Boundary Difficulties. For parameter values (PP 1) in the 


interior of I, there is no difficulty with the limiting distri- 
bution theory for the mixture maximum likelihood estimators Gis) 
For large samples the estimators will almost always be (nj/n, n/n) 


and the usual trinomial results will take over. However, for 
atomic mixtures 6(¢), whose parameter values are on the 
boundary of I, the theory is different. For example, the sta- 
tistics ($2, n/n) will have bivariate normal limiting distri- 


bution, and hence Po» being the maximum of the two, will not be 


normal. Thus if one does not restrict the parameter space to the 
interior of I one is in the awkward position of having as 
estimator whose limiting distribution depends on the unknown 

true distribution in a discontinuous fashion. 
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This result demonstrates, and results from parametric maximum 
likelihood theory substantiate, the necessity of identifying an 
appropriate interior to the mixture parameter space PM(2) before. 
a satisfactory theory can be developed. It is conjectured that 
distributions of the continuous type should be in that interior. 

If so, the situation is reminiscent of the theory of the empirical 
distribution function. There also, the maximum likelihood 
estimator FA is discrete, but the distribution-free limit theory 


for FF depends on F being a continuous distribution. 


5. CONCLUDING DISCUSSION 


The subject of maximum likelihood estimation of mixtures 
seems to fall midway between two great bodies of statistical 
theory: The theory of maximum likelihood in the parametric case 
and the theory of the empirical distribution function. Each of 
these subjects has been found to have an elegant and profoundly 
useful structure. It is anticipated that such a theory could be 
developed for mixture estimation, which would greatly expand 
its potential. At present it can only be used in an exploratory 
fashion. Any further role will depend on’*the development of an 
inferential theory. 
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CONDITIONAL MAXIMUM LIKELIHOOD ESTIMATION IN 
GAUSSIAN MIXTURES 


GEORGE E. POLICELLO II* 


Department of Statistics 
The Ohio State University 


SUMMARY. The paper gives a brief history of the estimation pro- 
blem for mixtures of gaussian populations. The classical maximum 
likelihood approach is not appropriate since the likelihood func- 
tion is unbounded at each sample point. However, this does not 
seem to cause serious problems when iterative methods are used 

on a computer. This phenomenon is partially explained by the 
conditional likelihood approach taken in this paper. In addition, 
the conditional likelihood approach leads to consistent, asymp- 
totically normal and efficient estimators for the parameters of 
the mixture. The results of a Monte Carlo study are reported 

for the univariate case and these show that the procedures pro- 
vide reasonable estimators for small sample sizes. The methods 
are then extended to mixtures of multivariate gaussian populations. 


KEY WORDS. Gaussian mixtures, maximum likelihood estimation, 
mixtures of distributions, normal mixtures. 


1. INTRODUCTION 


In recent years mixture distributions have received consider- 
able attention as models for physically important processes in 
biology, clinical chemistry, ecology, economics engineering, 
fisheries, medicine, natural resources, etc. Boswell, Ord, and 
Patil (1979) provide a thorough discussion of the literature 
and of some sampling mechanisms leading to mixture distributions. 
Clinical chemistry is one area where mixture distributions are 
being studied by principally by nonstatisticians. Gindler (1970) 
and Grannis and Lott (1978) are two tyical examples of the work 
in this area. 
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In this paper, the discussion is restricted to mixtures of 
gaussian populations. Historically, Pearson (1894) appears to 
be the first person to address the statistical questions asso- 
ciated with a mixture to two univariate gaussian distributions. 
Unfortunately, Pearson's method of moments approach requires the 
solution of a ninth order polynomial equation and leads to 
unstable solutions that could result in negative variance esti- 
mates. There were many attempts to overcome these problems. 
Pearson and Lee (1908-1909) used incomplete moments, Pearson 
(1915) worked with the moments again, Gottschalk (1948) used half 
moments, Rao (1948) improved the computational situation 
slightly and, more importantly, introduced some maximum likeli- 
hood alternatives, and Behboodian (1970) continued these efforts. 
Cohen (1967) created an interesting hybrid approach by introducing 
a conditional likelihood given the first four moments. Quandt and 
Ramsey (1978) introduced a clever extension of the method of 
moments using the moment generating function. 


The current resurgence of interest in the problem seems to 
start with the maximum likelihood estimator (MLE) approach in 
Hasselbland (1966). Macdonald (1969), Gridgeman (1970) and Fryer 
and Robertson (1972) are of particular interest. The paper by 
Day (1969) gives a comprehensive comparison of moment estimators, 
MLE's and Bayes estimators. Unfortunately, the maximum likelihood 
procedures usually require an assumption of equal covariance 
matrices for all of the component populations. 


Many of the estimation procedures require iterative methods to 
obtain the final results. The problems associated with iterative 
procedures have been studied by Chang (1974), Dick and Bowden (1973). 
Hasselbland (1966), Hosmer (1973) and Peters and Walker (1978). In 
a more general setting, Hasselbland (1969) develops iterative pro- 
cedures for mixtures of distributions in the exponential family. 


John (1970) considers the problem of estimating the popula- 
tion of origin for a set of observations on a mixture distribu- 
tion, hence, relating these procedures to the cluster analysis 
problems. Rayment (1972) considers the identification problem. 
James (1978) and Ahmad, Giri and Sinha (1980) restrict their 
attention to estimation of the mixing parameter. In contrast, 
Tubbs and Coberly (1976) study the robustness of the mixing 
proportion estimator when the gaussian components are shifted. 


This survey of the literature cannot be completed without 
mentioning the methods based on graphical techniques. 
Bhattacharya (1967) provides useful techniques and many refer- 
ences of interest. Other papers dealing with graphical methods 
ag pe *°bks (1963), Harding (1949), Preston (1953) and Taylor 

ENO) Ve 
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In this paper we introduce a conditional maximum likelihood 
estimator based on ideas different from those in Cohen (1967). 
For an arbitrary mixture of two gaussian populations, the likeli- 
hood function is unbounded at each observation. In the next 
section we highlight the reasons for the failure of the procedures. 
Basically, they are trying to estimate the variance of a gaussian 
population using fewer than two observations. This structure is 
exploited to create the new estimation procedures in this paper. 


2. THE BASIC PROBLEM 


The notation required for the multivariate problem is quite 
complicated and lends nothing to an understanding of the basic 
principles involved in the estimation procedure. Thus, the 
theoretical development is restricted to the univariate case and 
the multivariate results are summarized in a separate section. 
Also there is no conceptual difficulty in extending the results 
to mixtures of more than two gaussian distributions; however, 
the necessary extra steps would unduly lengthen the exposition. 
Therefore, the main results are obtained for a mixture of two 
gaussian populations. 


/2 


Let o(z) = (or) + axp Geeta denote the standard 


univariate gaussian density function. For Uy > Uys of 05 and 
a real numbers such that of a0, 05 eaO sand i0 < af< d, 


g(x) = (1-0) [ (x-H,)/o, 1/9, 3 ad[ (x-u)/o,]/9, 


is the density function for a mixture of two gaussian distribu- 
tions. Note that the notation suppresses the dependence of g(x) 
on the parameters. Let the vector of parameters be denoted by 


QO' = (uy > Uys as oO a). The usual demonstration that there is 


no theoretically reasonable set of MLE's for the five parameters 
in © works because it is possible to isolate the term, in the 
product expansion of the likelihood function, L, that corresponds 


2 
to one factor from population one (parameters Uy and 01) and 


(n-1) factors from population two. In this term, there is ng 
companion exponential term involving the variance estimate, Oy> 


that would attenuate the affect of on going to zero. In effect, 


including this term in L forces any algorithm to try to estimate 
the variance of-a gaussian population with fewer than two obser- 
vations. For reasonable sample sizes, situations like this 

have small probabilities of occurring. Hence, many authors argue 
that this result is of little practical importance for many of 

the computer oriented computational schemes; see, especially, 


~ 
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the individual discussions following the Quandt and Ramsey (1978) 
paper. 


The above observations suggest a plan for modifying the MLE 
procedures to produce an estimation scheme that would overcome 
the unbounded likelihood problem and retair. the, hopefully, good 
properties of MLE's in general. The four offensive terms above, 
zero or one observation from one of the populations and all of the 
remainder from the other population, would not be present if the 
likelihood were computed conditional on at least two observations 
from each population. Using this idea, the resulting conditional 
likelihood function is bounded with probability one. The next 
section develops the associated estimation procedures. 


3, THE CONDITIONAL LIKELIHOOD APPROACH 


Due? 
ake a@h ae (Hy 9H 297 005 90) and let -x' = (x,.°°'>x)) be 


a realization of X, a vector of n independent observations 
from g(x). It is convenient to introduce a compact notation. 
Let 2= 1.2.) anaes ee {(r).%5.%5.7,)3 (ry >% oT 3>,) is 


an ordered set of four distinct integers from n}. Thus, there 
are n!/(n-4)! distinct vectors in yw. Let {a,},{b,},{e,} and 


{d,} be four arbitrary sequences of n real numbers. Then we 


use zy a. * Ch dy to denote 


pS > 
ijk ew 1 ity 


Define the event A = {at least two x's from population 
one} and B = {at least two x's from population two}. Then 


Pr{A f B} = h(a) = 1 - (1-a)™ - o® - Pr eee 


defines h(a). Let $(z) be the standard normal pdf and 
2 2 
t(x;8) = a° (1-0) Xy LOL (x5 Hy /04 1/04 MOL x5) /0, 1/043 


Xx 


{$0 (x, -H,) W/o,1l/o, Hol (x)-11,)/0,1/0,} 


* Tad (4,5,k,2)8 Sw 
where g(x) = (1-0) 6 Gx-H,)/9, 1/0, + ap{(x-H,)/,]/o,. 


Then the conditional likelihood of ® given Af B is propor- 
tional to 


2(03x) = t(x30)/[h(a)a7(1-a)2). 
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This can be simplified by defining 


W(x) = (1-0) OL Gu, )/0,1/ (0, 8(x)) 
V(x) = adl (x-u,)/o,1/(o,8(x)), 
and Ore Ty =18(%, a5 


Note that for all x, w(x) + v(x) = 1. Since there is no ambiguity 
in the argument lists we will use 2 for 2(6;x), t for 

t(x;6) and h for h(a). The subsequent expressions are nota- 
tionally simpler with the following additional correspondences, 

let WO, = nes and ve = V(x,). Using these new variables 


£ Boe i els 5 
In order to maximize 2% it is necessary to obtain the deriva- 
tives of t and h with respect to all of the parameters in 6. 
For any function of 6, say p(8), we denote the various deri- 
vatives by 

ey ie 
Payt® op/ du, ; P= 9p/9L,, etc. 


There is a common structure in all of the w and wv _ deriva~ 
tives that can be exploited to reduce the work. Henceforth, let 
0 denote an arbitrary element of 6 and define 


q (x; Hy) = q(x; Hy) = x, q(x; 04) = (x - Ee, 
q(x; 05) = (eh ier, and q(x; a) = v(x). 


Also define f(x; 0) = q(x; 9)- 9, a sign function 
' 4 2 Bg : a: 2 
So =) ate p= Uy> o ore oO  and* =s(0)*= 1)" it” 0 Uy» 0, 
Due to the asymmetry in some of the functions, define 
2 ; 
Otss 0) = Wix).. 1f “0 = MW, oF Oo; OG Or = Cx) at 
2 
p=U, or 0, and 6(x; a) = 


0 


Finally, define u and n? 


by 


n e . 
fet = rer q(x, 5 P)S(x,5 0) 


p n . 
and nni= veel 8 (x, 5 Os 


A reasonable amount of calculus yields for all op 
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PLS) 
t' =Q ia ea bs "a? 
p Pp 1 
+ 205 9 lay + fo te’ oe (1) 


For all of the results in this paper, more computational detail 
is available in Policello (1980). 


3.1 The p #0 Cases. For p #4 setting gots 1G yields 


re) 
o[n n 26) WW.V,.Vo) +2 s(P)ZwViOViVo 


-2s(p)2,W,V,W,W, Vv 


Del ape eke g! 


fe) 
nu Fe Ok 


cE ef Nek © PW ViWV,Vp 


+ 


28 (OL aCxs.5 P)W,V,WsW Vo (2) 


These equations will be used as a part of an iterative scheme 
for computing the parameter estimates. However, the restricted 
summations over wW must be reduced to simple unrestricted 
summations or the procedure would require too many computation 
steps to be practical. First define some averages, let 


n 
aes z vin1 # 


mas P Mngt : rs 
Peet g F Maye Pay 
and define Yoo by 


43 =D. eee en 
TOO. uetWse dP ak ® 


It 


nd VG 2 2 
1 T9591 [oa9Sq1 42195 02+4510511501! 


2 
+m [26575] * 251 95,2] - S50: 


In practice, it was possible to obtain better numerical 
results when the r and s_ subscripts on Ae and Cf were 
r rs 


as close as possible. The final results required considerable 
rearrangement of the initial expressions. To this end, define 
. 6-(r+s) 
Nee by letting n Nite 
following equation: 


be the coefficient of r» in the 


ee 


ee ea ee 
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x +3 pv = ; 
[ yrs Pap ju very Leys P)W,V,W,W, Ve I 


5 Aube obo Se Arnos ia ~ Soy) 
stig Apel 
3 Ratt ONG 2 Lae y i nae 
- alse 3 are, Sob a deny 
* hater Pe ttn aiak ipsa cect 
For example, Yq2 = S10 = 3050501 + ce 


Sve a -1 
and Yoq = [Soy sbi 0501 +n Coil: Note that the xe are 


the same for all po # a. Using this notation equation (2) 
becomes 


le) 
P[ny on 


=i, = =2 
+ 2s(p) (¥44541 + ¥q95498 + Yo1591" + Yoo5o0" 
=) 
+ (03, - S5,)n J] 
2 p 
BY" 


eal 2 


0 gush 0 p - 
+ 28(O) [33444 * Vy2472™ — + Yoq4o1™ ~~ + Yo2490" 


Reb 70 of-3 
+ 35 An3)B Te 


These equations will be a major part of the iterative compu- 
tational methods and it will be convenient to have a compact 
representation for them. Therefore, for p #4, we represent 
the above equations by pC ee 8) = D, xs 8) with the obvious 


definitions for a and D6 


3.2 The Derivatives when p =a. In this case setting g* = 0 
yields 


& =a tt an 1 (20-1) (4) 


~ (n-1)02 (1-0) 7¢ (1-0) 73 = Fy /n (a) 
-1 -l 


a <5] 
— 2 “Yoo l¥y 4511.7 Ya2512® + °¥ 21521" ~~ + Yo2%o0" 


ao 


Z 
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As in the previous section it is convenient to represent this 
equation in the form ac, (x; 6) = Dy (x; 8) but here Cy =], 


3.38 The Iterative Computation of 6. We propose to calculate 
the estimate of 6 iteratively. Let 6(k) be the estimate 
computed at the kth step of the procedure. To start the proce- 
dure it is necessary to have om initial value, say 6(0). Take 


X=n ta’ and 5? = (n-1)_ Ee - ~2 . It was found that 


= ce 
6(0) = Cea ioe Xx 40.15, 258°, 258°, -5) led to satisfactory 


solutions in a reasonable number of iterations. There may be 
better choices in some situations but the generally satisfactory 
performance for this initial vector was sufficient for our pur- 


poses. Let 6'(k) = (6,(k), 6 5 (k)» 6 36) 6 0%)» 6 5(k)). At 
step k = 5m+ j, m= Pe ula and j= janet take 


G) Gm'+ 4) = 6, Gm + j= 1) (fer aor 
and 
8, (Smtj) = Dy [xs 6 (5m+j-1) 1/C, [x; 6(5m+j-1)] for j=. 


This procedure can be viewed as a multivariate modification of 
the Newton-Raphson method for finding zeros of equations. 


4. THE MULTIVARIATE CASE 


For i7=al) (ore ?2” slet v, be the density function for a 
nonsingular k-variate gaussian distributiow with mean vector Hy 
and covariance matrix ay = eR Let x be an arbitrary real 
k-vector and 0<a< 1, then the multivariate mixture density 
is given by 

g(x) = (1-0), (x) + ow, (x). 
Likewise, for the vector x we extend the definitions of w 
and v by w(x) = (1-0) p, Cx) /8 G0) , and v(x) = 1 - w(x). 
Continuing this process of extension, let Kjos sk denote a 


random sample of n observation vectors from the distribution 
with density g(x) and let x, denote a realization of X,. 
a 


Also define A = {at least two Xs from population one}, 
= {at least two x; from population two}, Q = Ty 186%)» 
= it 


=x), vy, * V(x); etc. Then, as before, the conditional 
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likelihood given Af B is proportional to 
a " 2 2 
Q QE 05H 54, V9/ Th (a) a” (1-0")]. 


Let p be the parameter designator, as before, but now 0 
X,, 2, and a. Then the extended 


tak h 
akes the values Aid Uns v *2 
definitions of es become 
u u 
ds f. 2yts oD 6 s 
‘i ae ae a “a Bia * Hp) OD) e 
aT n G s 
= = sac eny |) 
mA = Ryn Hy) HD)" @ 0) 
¥ 
caine r = ey Be PRS 
and n his = 25 Uy) Cx, Uy) (w,) (v5) 


aa} ise) a me) 
Note that A and 2X are k-vectors and i and A are 
rs rs rs rs 


kxXk matrices. 


Now using matrix differentiation, some recombination of 
terms and the direct analogue of the univariate development we 
arrive at equation (3) and equation (4). Of course, all of the 
quantities should now be given the appropriate multivariate 
interpretations. In this fashion the iterative solution proce- 
dures in Section 3.3 can be directly applied to the multivariate 
case, 


5. ASYMPTOTIC RESULTS AND RELATED MATTERS 


The unconditional likelihood function is given by 
L=Q=s aa g(x,). If we formally proceed and try to maximize 
this by setting the derivatives equal to zero we obtain: for 


p #0 


Dyn d (x, 305 (x, 3 P)/EF_1O(KyS 0), (5) 


52) 
i] 


and Qa = CoOL? (6) 


—_ ° =_ ° = = =s ' 
where q(x, 3 uy) = q(x, 5 Uy) = aie By q(x, x1) (x, Hy) (x, uy) 
and qx, 5 z) = (x,-H,) (x,-H,)"- The iterative estimates based 


on equations (5) and (6) are of the EM type (Dempster, Laird 

and Rubin, 1977) and, hence, converge to a local maximum. In 

the switching regression setting, Kiefer (1978), has shown that 

a root of the likelihood equations corresponding to a local maxi- 
mum is consistent, asymptotically normal and efficient. These 
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two results, taken together, show that the MLE type procedures 
will asymptotically produce "best" estimators. In fact, once 
the solution for equations (5) and (6), say 9, is obtained, it 
is possible to consistently estimate the covariance matrix £or 

6 using the inverse of the estimated information matrix 
(Kiefer, 1978). 


Hosmer (1973) has shown that, for "reasonable" sample sizes 
and initial values, the iterative MLE type estimators will not 
converge to § values associated with the singular points of 
the likelihood function. Unfortunately, for small sample sizes 
the singularities can be a problem when the component distribu- 
tions are not well separated. 


Taken together the above observations show that the uncon- 
ditional MLE type procedures, using equations (5) and (6), lead 
to consistent, asymptotically normal efficient estimators of 6. 
We now connect this with the estimators developed in this paper. 


Note that equation (3) can be written as 
p + O(n) = Z7__q(x,3 p)6(x,3 9)/E"_ 8(x,3 p) (7) 
Lele de 45 ish ib Is 
and equation 4 can be written as 
-1 


where, as usual, O[g(n)] means limsup__,..|0[g(n) ]/g(n) | < , 


Thus, for large n, the solution to equations (3) and (4), 6, 
and the MLE type estimator, 6, will differ only by a term of 


order ne This being the case, 6 shares all of the en 
desirable asymptotic properties of 6. Thus, asymptotically 6 
is consistent, asymptotically normal and efficient. 


In equations (7) and (8), the O(n +) terms serve to, remove 
the problems associated with the singularities in the uncondi- 
tional likelihood approach. Hence, for the MLE type procedures 
the singularities disappear, as a computational problem, at a 


rate of O(n ae This, theoretical observation, is consistent 
with the empirical observations of Wolfe (1965, 1970, 1971), who 
noticed that the singularities caused computational problems only 
when na <l. 


6. THE MONTE CARLO STUDY 
The Monte Carlo study was conducted on the Amdahl 470 com- 


puter at the Ohio State University. The SUPER DUPER random num- 
ber generator (Marsaglia, MacLaren and Bray, 1964) was used to 
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generate all uniform and gaussian deviates. A random sample of 


size n from a mixture of a gaussian (4 02) distribution and 


; 2 
AeGAess2an, Cu, 95) distribution with mixing parameter 


a3; say Xe k , was obtained by generating n Uniform 
(0,1) variates, Upset ole and n standard gaussian 
variates; Liaise Then if U, <EOy Xx, = O52, + Uy and 
ne = 5 

U, aes X, 0,4, =e Uy 


The moment generating function approach of Quandt and 
Ramsey (1978) appears to be the most effective estimation pro- 
cedure, not based on MLE principles. Hence, we decided to use 
their parameter selections in order to provide a reasonable 
basis for comparisons. The seven cases, parameter selections 
and sample sizes, are given in Table 1. All seven cases in our 
study used 100 simulation repetitions and the resulting estimated 
MSE's and estimated standard deviations for the MSE's are reported 
in Table 2. 


In many problems the standard deviations, not the variances, 
are the quantities of interest. For this reason Table 2 also 
contains the estimated MSE's for the standard deviations of the 
populations. 


7. SUMMARY 


The classical MLE estimation approach cannot be applied to 
mixtures of gaussian populations since the likelihood function is 
unbounded at every observation. However, formal application of 
the method appears to lead to "good"' estimators in many cases. 


This paper developed a conditional MLE approach given at 
least two observations from each population in the mixture. Con- 
ceptually, some might object to this on the grounds that it is 
impossible to know when the sample contains at least two obser- 
vations from each population. Practially, this is no problem. 

We simply define the estimator to be the solution of the condi- 
tional likelihood equations for all situations. It is our con- 
tention that this is as reasonable as any other procedure, when 
there are fewer than two observations from one of the populations 
and we must estimate all five parameters, 


After the procedures were developed they were related to the 
unconditional, formal, MLE approach. It was shown that the 
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TABLE 1: Parameter selectton summary. 


Sek peri trier Se as, SHRI DR ee ee 
eS ees Ee OE eS Sa a er ee 


Case 
» cots SPPALiy gh Oo es Soe eee tenes. Se eer 

Parameter si 2 3 4 5 6 7 

Wy =3y -1 -lise —3. -3. -3. op 

Uy 3 We i 3, 3. ef 3. 

of ib alt Bi ihe, al A bi ie 

05 3 3 Cee ee 9 

a 5 5) S 5) aga 75 3} 

n 50 50 100 50 50 50 50 


TABLE 2: Estimated mean square errors and average number of 
tterattons to convergence*. 


Case 
Parameter 1 2 3 4 5 6 7/ 
Wy -049 -148 -095 .074 042 1.26 eB Sieh 
(FOLL)"* “CCOL9)” yes01ee ae (COIS) C007). “ea266) ae 042) 
Uy 265 RSS) - 800 TE 5.44 7.60 - 434 
CL022)""CalLO0 YY *CaleOy C298) Ce 7 O44) Gl OG) CeO a4 
Oo, -032 .-070 - 046 061 e338 31428 lily, 
(05.5) « 7G0016), » .€7008)," Ge009),! (2101).c-66536)inae. Oe) 
o5 082 a225 - 240 932 DOr 2.09 . 100 
(009) "C1 037) “°C 038)" C T34y C492) ee alos tle) 
of 21:23. «36 -169 -290 2.86 94.1 - 300 
(3025) +(. 144). 0nGcO24)c6. 055) ox€2. ake Ce0abtaG.0.o) 
05 939 USTs! TASzZ 40.9 265.5 207.6 2220 
CoLLSy 74. 245)" ? (2224) ACK ORY m* (5855594096. be eee) 
a - 006 -019 0.29 SOs Ode 3053 -018 


(.001) .(.003). (.004) (¢.002)..;(.003) ..(.006), (.004) 


iheerations ella 47.8 69NS IMS aa) L5e0 41.0 46.5 


*Estimated standard deviation for each MSE is given in paren- 
theses below number. 
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methods only differed by terms of order Ota as Thus, the con- 
ditional approach leads to consistent, asymptotically normal and 


efficient estimators. The Oth) term also helps to explain the 
computational results reported by Wolfe (1965, 1970, 1971). 


For the univariate case we report the results of a simulation 
study showing that the conditional likelihood approach leads to 
reasonable estimators. It is interesting to note that the con- 
ditional likelihood approach is easily modified to cover the 
switching regressions problem which was also considered in the 
Quandt and Ramsey paper. However, there does not appear to be an 
easy modification of the moment generating function approach to 
cover the multivariate case. Whereas, the conditional likelihood 
approach leads directly to a solution for the multivariate case. 
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ESTIMATION OF PARETO PARAMETERS BY NUMERICAL 
METHODS 


W. LYLE COOK and DEAN C. MUMME 


Department of Mathematics 
Idaho State University 
Pocatello, Idaho 83209 USA 


SUMMARY. For the case when neither of the parameters of a 

Pareto distribution is known, an iterative process on the unbiased 
maximum likelihood estimators is investigated. Results from ran- 
domly generated samples are compared with other methods of 
estimation. 


KEY WORDS. Pareto distribution, estimation, numerical 
approximation. 


1. INTRODUCTION 


The Pareto distribution has as its most celebrated applica- 
tion the modeling of incomes which exceed a minimum value. The 
cumulative distribution function (c.d.f.) is 


Pix) =c = (a/x)?, fro Sa,b > 0; 


and the probability density function is 


f(x) = ba? ules. 


When the parameter a is known, the geometric mean of a sample 
of size n is sufficient for b, and when b is known the 
first order statistic is sufficient for a. -Together, the first 
order statistic and the geometric mean form a joint set of 
sufficient statistics for a and b (Malik, 1970). When either 
a or b is known, the estimation of the other is relatively 
straightforward. When neither of the parameters is known, the 
available estimators may be manipulated so that the value of a 
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parameter obtained from one can be used in estimating the second. 
Such a situation suggests a numerical iteration procedure to 
refine successive estimates. In this paper, we report the 
results of checking this strategy using artificially generated 
populations. The results are compared with estimates arrived 

at through other standard methods. 


2. ESTIMATORS 


2.1 Least Squares. Let X *»X) be a random sample from 


1? Xos°* 
a population following the Pareto distribution and let 
Yy> Yost? oY, denote the corresponding order statistics. One 


estimation involves least squares. By noting that 1 - F(x) = 


fa” and taking logs we have the expression 
log(1 - F(x)) = b®ieg av-=" blog *xz 


This is a linear equation in y = log(1l - F(x)) and z = log x. 
The standard least squares estimates of the coefficient b and 
the constant b log a can be used to approximate the parameters. 
With pure artificially generated data the least squares estimates 
return exact values of the parameters. Clearly this limits 

the suitability of the least squares method in our comparisons. 


2.2 Moments. The expected value of a Pareto variable is 
ab/(b-1) provided b> 1. Equating EX with X and solving 
for b yields 


b= X/(X =.a)% (1) 
Since theverd. ison Yy is given by 
Fy (x) =1- (1-¥,@I"=1- (a/x)™,  x> a, 
i! 
it follows that EY, = anb/(nb-1), or 
(nb-1)Y, 
215 Spiga oy: ee 


Combining this with (1) gives 
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Thus b can be estimated from Yy and X, and subsequently 
used in (2) to estimate a. 


2.3 Maximum Likelthood and Sufficient Estimators. Maximizing the 
likelihood function, 


le 


bri’? 


n 
L(x) ,°**,x)) =! it 
i=l x 


or its logarithm gives an estimate for b of 


n 


4 - nlog a 


Ylog X, 
Subject to the constraints on X the likelihood function is 


maximized in a when a= Yi: 


Another method of estimating a and b is to use the joint 
sufficient pair (Y,> 2r10g(X,/¥,)). The equivalent expressions-- 


which would be unbiased if the parameters were fixed--will be 
referred to as the sufficient estimators. They are 


‘ (nb = 1)Y, 


ee nb 


4 n-2 

Llog Xx, - nlog Y, 
Here again b is calculated first and then used in the estimate 
of a. This is a slight variation of the maximum likelihood 
procedure which estimates a_ through Yy then estimates b 
through 


n-1l 


bit Llog X, - nlog a 


2.4 Iteration. The original motivation for the investigation was 
to refine the sufficient estimate by successive iterations from 
a to b and back, by taking ag = yy then 


n-l 


bods ssa oS a 
ky j2l08 X) nlog a. 
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Each of the values ay» by is unbiased when the other parameter 


is known. Each of the sequences is bounded and monotone and 
hence converges. It would be desirable to have each converge 

to the respective unknown parameters, but this would be asking 
too much of the sample data. A more realistic expectation would 
be to have the sequences converge to values which improve the 
initial estimates, and which would be superior to estimates 
arrived at by other methods. 


2.5 Combination. It can be shown that the sequence {a, } 


is decreasing, thus the estimate arrived at for a through 
iteration must be less than Yi which is the maximum likelihood 


estimator. In observing the trial runs, however, it was noticed 
that quite often this iteration yielded a value less than the 
known value of a. 


An estimator that combined Y, with the iterated estimate 
of a was introduced to partially correct for the occasional 
undershoot. Since four iterations usually gave an estimate of 
a which was stable to five decimal places, a ratio of 1:4 was 
selected. Thus, the combination estimate for a 


Yy + 4(iterated a) 
a= . 


5 


3. CRITERIA 


Quandt (1966) has performed sampling experiments using 
moments, maximum likelihood, least squares, and quantiles. His 
performance criteria were bias, mean square error, and bias in 
Lorenz coefficients. He found no statistically significant 
differences based on these criteria, but informally felt the 
maximum likelihood and quantiles had an edge. He then proposed 
a method of fitting by minimizing a modified difference function 
which gave results comparable to the other methods. Our tests 
compare moments, maximum likelihood, iterations, sufficient, and 
combination methods. As in Quandt's experiments, bias will be 
a judgment criteria, but rather than mean square error, we have 
tabulated the number of times each estimator among the five was 
closest to the true value of the parameter. 
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4. SAMPLING 


One hundred samples of each size were generated on a 
Tektronix 4052 by inverting uniformly random data from the unit 
interval. To account for the random error, which usually exists 
in real data, the data points were then shifted a scaled random 
amount left or right according to y = y + .2(y-a)(RND[-.5, .5]). 
Samples of size 25, 75, 150 were generated and a spectrum of 
parameter values was used, on the chance that differences might 
occur as the ratio of a to b changes. Averages were 
calculated and reported for each method for each set of samples. 


5. COMMENTS 


The raw sufficient statistic does not compete well for either 
parameter. This is evident either from a cursory examination of 
the data or through analysis of the counts, ranks, or averages. 
There are other statistically significant differences but they do 
not seem strong enough to adhere to rigidly, especially consider- 
ing the extreme deviation of the sufficient estimator and the 
possibility of interaction. Some casual observations and inter- 
pretations may be of greater value. 


The moments statistic shows very little bias in estimating 
a, however, it is the closest estimate in relatively few samples. 
Conversely its bias in estimating b is high, but it is the 
closest estimator in a good number of samples. The bias 
readings are consistent with Quandt (1966), but his m.s.e. measure- 
ments would not lead a person to expect the closeness readings 
we observed. The combination statistic, while not a total loss, 
does not seem to give enought improvement to warrant future 
consideration. The iteration statistics are not runaway improve- 
ments either, but it is safe to say they are competitive, 
especially when evaluating b. In this case the iteration 
statistic had the best bias reading and the best closeness count 
for all values of the parameter selected. 


The methods of estimating parameters from a Pareto population 
utilizing maximum likelihood, iterations, and moments all seem 
to be of roughtly the same order of magnitude of desirability. 
It may be that maximum likelihood improves with increasing sample 
size, while the iteration may be of most value when N is small. 
Moments are equally acceptable and competitive according to 
our criteria at all sample sizes, and may perform inversely in 
relation to their bias. 


The iteration method certainly is not any worse than standard 
techniques, and may after more careful analysis be shown to have 


minor advantages. 
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TABLE OF PERFORMANCE OF ESTIMATORS 
(Numbers in parenthesis record the number of 
times the estimator was closest to the true value.) 


Maximum : 
Likelihood Iterate Moments Sufficient Combination 
Aca ibe =o) 
n= 25¢2 1.0296 (33) 1.0010 (42) 1.0053 (7) 1.0346 (1) 1.0067 (17) 
7 bie 15770 79) 1.5104 (43) 1.7604 (36) 1.7138 (2) 
n=75¢2 1.0098 (32) 1.0006 (35) 1.0012 (9) 0.9989 (2) 1.0024 (22) 
= be Pe S1Or- (26) 1.4896 (38) 1.5984 (30) 1.3691 (6) 
w=15042 1-0050 (34) 1.0005 (28) 1.0007 (10) 0.9981 (1) 1.0014 (27) 
> b 1.5229 (24) 1.5127 (45) 1.5938 (30) IP 272 CL) 
TOTAL fa -0443 (99) 0021 (105) 0081 (26) -0377 (4) -0106 (66) 
BIAS \b -1100 (69) 0334 (126) 4526 (96) ASE NC RED. 
a = 20 b=5 
w= 25 ¢2 20.1587 (34) 19.9960 (34) — 20.0031 (13) 20.1366 (0) 20.0286 (19) 
a ye Soekeu 1Gl)) 5.1694 (42) 5.3976 (36) 5.2803 (5) 
n= 7542 20-0577 (29) ~— 20.0040 (32) 20.0048 (13) 19.9572 (0) 20.0147 (26) 
i bi 5. 108225) 5.3901 (40) 9c 225,7(83) 5.2782 (2) : 
nN =150¢2 20-0250 (35) 19.9984 (26) 19.9987 (8) 19.9893 (0) 20.0038 (31) 
b 5.1034 (19) 5.0691 (46) 5.1304 (33) 4.7008 (2) 
TOTAL J a -2414 (98) .0096 (92) -0092 (34) -1901 (0) -0470 (76) 
BIAS \b -6066 (61) -2776 (128) -6505 (102) -8578 (9) 
a = 6000 b = 2 
n= 25¢2 6133.2 (30) 6007.0 (33) 6017.7 (14) 5908.5 (2) 6032.3 (21) 
b 2.0965 (16) 2.0093 (46) 2.1970 (34) 1.5961 (4) 
n= 7542 6043.8 (31) 6003.0 (29) 6004.1 (11) 6026.5 (5) 6011.2 (24) 
bi 2.0823 Cr9) 2.0048 (43) 2.0713 (36) 232271 (2) 
n=150¢2 6021.4 (38) 6001.5 (24) 6001.8 (18) 6058.0 (2) 6005.5 (18) 
b 2.0474 (20) 2.0337 (37) 2.0684 (41) 1.9702 (2) 
TOTAL Ja 198.44 (99) 11.53 (86) 23.63 (43) 176 (9) 48.92 (63) 
BIAS \b -1762 (55) 0467 (126) -3366 (111) -6609 (8) 
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A NEW ESTIMATION PROCEDURE FOR THE 
THREE-PARAMETER LOGNORMAL DISTRIBUTION 
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and 
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SUMMARY. A new procedure for estimating the parameters of the 
lognormal distribution is presented. A Monte Carlo study, 
carried out with samples of various sizes, has shown that the 
procedure is more efficient than maximum-likelihood, when samples 
are small. The procedure allows, also, hypothesis testing of 
lognormality. A table of percentage points is presented for this 
purpose. Finally, an example illustrates the application of the 
procedure. 


KEY WORDS. logarithmic transformation, parameter estimation, 
test for lognormality. 


1. INTRODUCTION 


A distribution of X is said to be lognormal if there is 
a value 96 < X_ such that 


Z = log(x-8)  N(u,0°). (1) 


The probability density function of a variable X lognormally 
distributed is 


Py = ((X-0)0v2]"!  exp{-[1og(X-0)-u]7/207}. (2) 


The parameters wu and oO are, respectively, the mean and the 
standard deviation of the distribution of Z, and ®@ isa 


aPror. Amato has contributed with the Appendix. 
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threshold value below which it is not possible to have values 
eye 26 


Many procedures have been proposed to estimate the parameters 
of a lognormal distribution (Aitchison and Brown, 1957; Johnson 
and Kotz, 1970: Giesbrecht and Kempthorne, 1976). Among them, 
the maximum likelihood method estimates the value @§@ iteratively 
such that 


5(x,-6) 7! + [Zy,/(x.-6)]o = 0 (3) 
ails il si 
where: 
y, = [log(x,-8)-u]/o (4) 
a [Z10g(X,-9) 1/n (5) 
ee {Z[1og (x, -8)-u]7/a}? (6) 


In a previous note (Chieppa and Ricci, 1979), after discussing 
the meaning and the importance of the parameter @ in the lognormal 
distribution, a new numerical procedure was suggested for estima- 
ting @. This procedure is now suggested either for the parameters 
estimation or to test the hypothesis of lognormality. 


2. THE PROCEDURE 


If a random variable Z is normally distributed, the 
skewness index is 


VB, = 0 [E(Z,-u)*1/o> = 0 (7) 


and the kurtosis index is 


B, =n [E(Z,-u)"]/ot = 3 (8) 


A value 9@ can be found by solving 


VB, = n”{Z[ 10g (X,-6)-n1?}/0° 2% 


where [W and oO, are given by (5) and (6). Furthermore, if 
the same value 090 satisfies approximately 


n-{Z[Log(x,-6)-u}"}/64 = 3, 


then the distribution of .X can be considered reasonably log- 
normal with 8, w and o being the parameter estimates. 
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Therefore an iterative procedure is suggested for esti- 
mating 9, wu and o, fixing the condition 


\v;1 <e 


for some sufficiently small value of e. 

The existence of 6, and, consequently, of wu and o, is 
demonstrated in the Appendix. If several different 8-estimates 
exist, then the largest is to be preferred. Afterward, Bo can 
be calculated in order to test the hypothesis: 


Hes @ee—-8 H,: 8, # 3 


3. MONTE CARLO STUDY OF ESTIMATES 


In order to check the properties of the suggested procedure, 
Monte Carlo experiments were performed on the IBM 370/158 
computer of Bari University. Random samples of sizes 20, 30, 50, 
100, 200, and 500 were generated from a normal population with 
parameters w= 4 and oO = 2. Five hundred samples were gener- 
ated for each sample size. Each pseudo-random number Z was 
transformed as follows: 


Reha, , (10) 


in order to get samples from a lognormal population with parameters 
u=4, 0=2 and @= 10. These parameters have been chosen 
in order to make useful comparisons with Harter and Moore (1966). 


For each sample, parameters 9, wU, oO and Bo were esti- 


mated according to the suggested procedure. For € the value 
0.05 has been fixed: other simulations (Chieppa and Ricci, 1979), 
find this value convenient. Furthermore, maximum likelihood 
estimation of the parameters UU, 0, 9 and YB, was carried 


out with the maximum likelihood method, according to Hill (1963). 


Means and variances of the estimates, calculated from the 
500 samples, using both procedures are given in Table l. 


The results lead to the following conclusions: 


nan “~ 


a) For the suggested procedure, estimators 98, oO and B, 


show a negative bias, while estimator \U shows a positive bias. 
Bias and variances decrease as sample size increases; therefore 
the suggested procedure seems to be consistent. However, the 
consistency is not a necessary property because of the consider- 
ations under b2. 
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TABLE 1: Means and vartances of the estimators computed on 500 
random samples, each of stze n, generated from a lognormal 


opulation with parameters yw = 4, 0 = 2 and § = 10. 
eR LT ELIE RL SEE RET IOSE SEND 6 7 LES 2 SE eS os 
Sample suggested procedure ___———_—max. likelihood meth~ 

size mean variance mean variance 
An Ni « BUS ss og 28 eee ye eee te 

6 

20 8.652 39.5220 -19.981 3292.8442 

30 9.241 12.5459 - 6.187 3632.2029 

50 9.623 3.1648 8.453 770.0303 
100 9.950 0.4760 10.346 0.1486 
200 9.780 0.2880 10.168 0.0439 
500 9.848 O20735 10.093 0.0116 

u 

20 3.974. 0.2583 4.249 0.8129 

30 4.002 0.4817 4.046 0.4656 

50 4.013 0.0859 3.936 0.1202 
100 3.994 0.0468 3.952 0.0436 
200 4.005 0.0204 3.965 0.0183 
500 4.007 0.0094 3.980 0.0087 

G 

20 1.955 OAT MOS) 0.3220 

30 1.964 OLEZ89 20 LO 0.2062 

50 1.964 0.0665 2.081 0.0666 
100 1.993 0.0257 2. O53 0.0235 
200 1.963 0.0154 2.023 0.0125 
500 We Od. 0.0058 2.019 0.0044 

YB, 

20 -- == ©2072 0.9409 

30 -- -- -0.146 0.5279 

50 ae “= -0.227 0.0780 
100 == -- -0.113 0.0297 
200 -- oo -0.086 0.0181 
500 = -- -0.046 0.0076 

Bo 

20 2.540 0.4709 3.698 2.9554 

30 2.670 0.5093 3513 2.4640 

50 2 Nee 0.3313 3.098 0.1974 
100 2.885 0.2797 3.070 0.0957 
200 2.850 O2L703 3.033 0.0529 
500 2.868 OFU235 2.986 0.0243 
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b) A comparison between the suggested procedure and the 
maximum likelihood method shows that: 


bl) When maximum likelihood is used, in the case of small 
samples, the estimate of 9 is more biased and has a larger 
variance; on the other hand, in the case of large samples, such 
an estimate is less biased (but with positive bias). 


b2) The estimates of u and oO are substantially the same for 
both estimation procedures; nevertheless it seems that the 
suggested procedure is more efficient in the case of small 
samples; the bais shows an opposite sign. 


b3) The estimate of Y8 has a negative bias in the case 
of the maximum likelihood methods no estimate exists, obviously, 
in the case of the procedure suggested in this paper. 


b4) The MLE estimate of 8, shows a positive bias (the 
procedure here suggested shows a negative bias); variances, 
which are larger when sample sizes are small, become smaller 
when sample sizes increase. 


It can be said, in conclusion, that the suggested procedure 
seems to be more efficient than MLE when samples are small and 
less efficient when samples are large. The maximum likelihood 
method yields results in accordance with those obtained by 
Harter and Moore (1966); the only exception is for @ when n 
equals 50. 


4. TESTING THE HYPOTHESIS OF LOGNORMALITY 


In order to test the hypothesis of lognormality, on the 
basis of the procedure suggested in this paper, a test of the 
hypothesis: 
H,: Bo se 8 Ee? B., =2 3} 
is used. The significance limits of B, for a normal distribution 


are well-known (Vianelli, 1959), but they are not useful in this 
case, as the distribution of Bo is conditioned by the fixed 
value of vb, (vb, <p 


Bowman and Shenton (1975) studied the joint distribution 
of vB, and Bo in the case of a normal distribution, giving 
the confidence contours for vB, and Bo3 but also their sig- 


nificance limits of B,> for YB, =(0.,.)- are not useful in this 
case. 
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TABLE 2. Significance limits of B, in order to test for 
lognormaltity. 


Te 


Sample a = 0.10 a = 0.05 as= 0.01 
size lower upper lower upper lower upper 
OO eee ee eee 
20 1.68 3.88 1.60 Gai, 1250 42:90 
30 1.78 31.85 1.70 Ae2G 1.60 4.85 
40 1.88 3.82 19 Ae21- 1.68 4.80 
50 Vee U7 3.80 1.87 4.18 a275 4575 
60 2.05 3578 94 4.15 jeSal 4.70 
70 Die AeZ, See 2.00 fe 12 1.87 4.66 
80 Papdits: 3.74 2.06 4.10 Wn 4.63 
90 222 B12 Drei: 4.08 dl. OW 4.60 
100 225 3.70 Zim 4.06 2.01 4.57 
150 228 3.64 22s B96 2.06 4.44 
200 Paaehil 3.60 Dea sipisie) 2 LO IS 
300 2.36 Shoe) aaa} 3.86 2.46 4.31 
400 2.41 Bro 2533 3.83 2222 4.27 
500 2S 3.53 2.38 3.80 2.28 4.24 


From the simulations mentioned in the previous section, 
percentiles have been calculated in order to obtain the signifi- 
cance limits for a= 0.10, 0.05 and 0.01. These significance 
limits for different sample sizes are shown in Table 2. 


5. AN APPLICATION 


In the Instituto di Malattie Infettive of Bari University, 
DNA-polymerase activity (in counts per minute; c.p.m.) have 
been observed in 70 chronic carriers of aes The distribution 


of these values has yielded the following statistics: 


xX. = 5 x = 2044 X = 458.1 s = 389.6 
min max 
vb, = 2.30 by = 8.32 


As vb, and b, are significantly different from 0 and 3, the 


distribution of DNA-polymerase activity is thought to be log- 
normal. Therefore, the suggested procedure has been applied for 
parameter estimation. Results are: 


6@=99.1 pws 5.44 6 = 0995 


(JB. = -0.02) 8. = 3.05 
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As B, is not significantly different from 3, the lognormality 


hypothesis has been accepted and the above values are the 
parameter estimates. Particularly, § = 99 has the important 
interpretation of minimal radioactivity (in c.p.m.) existing when 
analyses for the determination of DNA-polymerase activity were 
carried out. 


Thanks go to M. T. Boswell for assistance with the revision 
of this manuscript. 
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APPENDIX: EXISTENCE OF 6-ESTIMATES 


Lognormal observations tend to be positively skewed while 
the logarithms of the observations follow a normal distribution 
with skewness equal to zero. Consider real numbers 


x &S Xo < eee < x, with skewness index 


Yb, (x) > (vn) [B(x ,-#) 71/12 Gx,-9) °° > 0, 


J J 
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and let z,(0) = log(x;-8), 6u< X> with skewness index 
b(6) = vb, (8). The object of this Appendix is to demonstrate 
that there is at least one 6 giving a skewness b(8) = 0. 


Theorem. lim b(0) 
6-00 


vb, (x) > 0, and 


lim b(6) 


Orx, 


-(n-2)/¥(n-1) < 0 


(and therefore there exists at least one §@ such that b(8) = 0). 


Proof. Let Z = log(X-8) where the variate X assumes each of 
the values x5» j=1,2,...,n, with probability 1/n. Recalling 


that skewness is invariant to linear transformations, and writing 


Z = logli + (X-x,)/ (x, -6) J + log(x,-8), 


it is seen that Z has the same skewness as log[1 + (X-x,)/ 

(x, -9) ] = Y. But as 9 approaches -~, Y becomes indistin- 
guishable from (X-x,)/ (x, -8) which has the same skewness as X. 
This proves the first part of the theorem. For the second part, 


let M, = E[Z >] = Dz /n. As @ approaches X1> the term z 


1 

approaches -©, while ZorZgrreesZ remain bounded. Hence, 

i i-1 ? 

M,/M) >n SS] Deesre 
2 
and Var(Z) /M, >n-l. 
Now 
b(8) = [M, - 3M,*M, + 2M?]/[var(z)]2*> 

3 1y2. 1 ; 

Dividing numerator and denominator by Mw = -|m, |? and using the 


preceding relations gives 


B(Oe -(n” — 3n ua) / ined etn ine ree 


ON THE ASYMPTOTIC DISTRIBUTION OF THE 
MULTIVARIATE CRAMER-VON MISES AND 
HOEFFDING-BLUM-KIEFER-ROSENBLATT INDEPENDENCE 
CRITERIA. 


MIKLOS CSORGO 


Department of Mathematics and Statistics 
Carleton University 
Ottawa, Canada K1S 5B6 


SUMMARY. This exposition consists of two parts. The first one 
of them is devoted to surveying recent developments in the asymp- 
totic distribution theory of the multivariate Cramér-von Mises 
statistic. The second part is a preview of some developments in: 
the asymptotic distribution theory of the Hoeffding-Blum—Kiefer- 
Rosenblatt independence criterion in that the there quoted 1980 
joint results with Derek S. Cotterill of the Department of 
national Defence, Ottawa, have not yet appeared elsewhere. 


KEY WORDS. Multivariate Cramér-von Mises statistic, Cramér-von 
Mises type tests of independence, distribution tables and rates 
of convergence for both, empirical processes, invariance princi- 
ples. 


1. ON THE LIMITING DISTRIBUTION OF THE MULTIVARIATE 
CRAMER-von MISES STATISTIC 


Let a5 pe be independent rv uniformly distributed over 
the d-dimensional unit cube, gi (d 21), and let EL be 


the empirical distribution function of Yporrt Ys ine. LOL 


d . “phegepar 
5 gone (y 0°71 9¥q) E.R EY) is) on times the number of 
e = Sey 
than or equal to the corresponding components of jy, conveniently 


written as 


eetage? j = 1,°°*,n, whose components are less 
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ja fal 
E(y) eB Gre yg he oth ie Tho,y,)%2” 
where I, denotes the indicator function of the set A. 


Consider the uniform empirical process 


} d 
ays n*[E_(y) - Aty)], YE dey wy flzieks 
d 
where A(y) = II Yi: It will be convenient for us to also think 
i=1 
about a) in terms of continuous distribution functions F 


on ni Let F be the class of continuous distribution functions 


on d-dimensional Euclidean space ae (d 21), and let Fo be 


the subclass consisting of every member of F which is a product 
of its associated one-dimensional marginal distribution functions. 
Let Xp kK be independent random d-vectors with a common 


distribution function Fe F. Let FL) be the empirical 
distribution of Xo ks i.e., f0r x = (x) 57°" 5X4) € Rie FL &® 
; =i . : 

is n times the number of x, = Kj pad Ky gd fj =pdaseren, 


whose components are less than or equal to the corresponding 
compoments of x, namely 


=i d 
F Cx} se BOCke, ° 25 see Me, Baus (Xiedeeily 
n nos lL d j=l isl (- »x,] ji 
Consider the empirical process 
Desi : d 
B Gx) = n°(F (x) - F(x)), = (x,t x ER, dol. 


Let y, * F,(x,) be the ith marginal distribution function of 


Bef and tet FG) be its inverse. Now if Fe Fos then 


' d 
B(x) = n7[F (x) - MP, (x,)] 


i=] 


xv 


n“{F [Fy (y,)90++sFy (yg) - ACy)} 


1 
€ 


n [E_(y) ~ A(x) J 


d 
afy), ¥a= Oy tare? enh 3, a 2°45 (2) 
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igen af OF € r » then B is distribution free. 
As to sO » the following results are known. 


Theorem A. Let XX, (n = 1,2,°°*) be independent random 
d-vectors with a commor distribution function F¢€ Fo and let 
at) be as in. (2). Then one can construct a probability space 
(Q, A, P) with {o,() :yeI* @ 21), 0=1,2,°"'}, a 
sequence of Brownian bridges {B (Cy) Dyes a (d3 1)} and a 


Kiefer process {K(y,t) : ye re (ds) 1), > OF onpttoso that 
for any u> O there exist a C>O such that (cf. Csorgo and 
Révész, 1975) for each n and for d31 


=! 
3/2 2(d+1) 5 < nl 


P{ Sup ja (y)-B (y)| > C(log n) (3) 
yel 
and whence 
_-l1 
Aisi 2 (d+ 
sup lo, (y)-B (y) | SPs fn (d+1) (log n)?/? P 
n n 
yel 
d+1 
sup sup ky, (9) - K(y,k) | azS+ 6 Ste) 4 57h e (4) 


l<k<n ye Id 


Also if d= 1, then (c.f. Komlés et aZ., 1975) for all n and 
x 


Ax 


4c logn+x)} >Le” , 


P{ sup ja_(y) - B L(y) | >n- 
O<y<1 


where C, L, A are positive absolute constants [e.g., (cf. 
Tusnady, 1977a) they can be chosen as C = 100, L = 10, A = 1/50], 


and 


-r 
P{ sup sup [k%,,(y) - K(y,k) | > (C log n + x)log n} < Le a 


l<k<n O<y<l 


where again C, L, A are positive absolute constants, and whence 


sup lay) - B Cy) | rh O(n “log n), (5) 


0<y<1 


} “Se 2 
sup sup |k*a, (y) - K(y,k)| “#°° O(1og“n) (6) 
l<k<n O<y<1 


Further, if d= 2, then (c.f. Tusnady, 1977) for all n and x 


L Ax 


P{ eat Ja, ™” - B Cy) | >n 7(C log n + x)log n} < te. 


NACo 


where C, L, A are positive absolute constants, and whence 


5 


a.S. 
= 


sup la_(y) ~ B Cy) | O(n toe aye 


ye 12 


The respective a.s. rates of (3), (4), (6) and (7) are best 
available, while that of (5) is best possible. For further 
illuminating comments concerning rates in higher dimensions we 
refer to Tusnady (1977b). 


The Brownian bridges and the Kiefer process of the above 
theorem are Gaussian processes, defined in terms of a multi- 
parameter Wiener process as follows: 


D1. Wiener process: A separable Gaussian process 


W(x) = {W(x 5°+++,%4) 2) O\Geay Set = ree: be 


with EW(x) = 0 and covariance function 
EW (x, W(x.) = A(x) AX,)> 


where x) = (x)4> Xpo9* Xz q)> Xo = (X545 Xyo9°*sXoq)s 


d 
= = oN 
XK. AX (Ax and A(x AXx,) I (x15 


it? a eee iar tS aa nai 


N 


D2. Browntan bridge: 


B(x) = {B(x 


d 
{W(x) - A(x)W(1,°°+,1):xe 73) m ,with: A(x).= I X;° 

i=l 
Whence EB(x) = 0 and EB (x, )B(x,) = A(x,AX,) - (x, )A(x,). 


D3. Ktefer process: 


d 


K(x, t) {K(xpe) 33x. 6 0. 4S 6} 


{wix,€) + AGOWEL aT ey ete ce eo 
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Whence EK(x,t) = 0 and 
EK(x,,t,)K(X,,t,) = (tyAt,){A(xyAx,) - A(x, ACA) I. 


Given Fe Fo » here we are interested in the asymptotic 
distribution of the multivariate Cramér-von Mises statistic 


d 
M dy,, 421, (8) 


Pe a: 
Roan Si B(x) I ; 
= 1= 


- 2 
n,d dF =f g ant) 
R al: I 


1 
where Box), ay), Tate F, (x,) are as in (2). Naturally, say 
by (3), we have for d21 that 


na (-)) > n(Bc-)), 


for every continuous functional h on the space of real valued 


functions on r% endowed with the supremum topology, and whence 
also that 


we ila we = f 7 B (y)dy 


2 2 
= weds 9 
aa oe J Bi(y)dy = Wa(n), d (9) 


d 
with dy standing for I dy, from now on. A direct way of 


seeing (9) is via 


O(r, 4 (n) Cloglog n) 2) ae doa 3, 
4s : 
2 aS 0 logl ) fd =25 
[we 5 By wi (a) | a.s (0, (n) ¢ ogiog Wey F 3 
E 0(p, (n) (loglog n) *) if de= 1% 
(10) 
or via 
4; ” 
O(n rq (n) (loglog Nk) Pt ted 2 
- 2 HG 
|w? en n . f d K (yn) dy | a.§ iy 
ye I 0(p, (n) (Loglog n) *) if d= 1, 


(11) 
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ext ae 
no sot) (log ny?! be Ge el 
rn) = 
aS d+1 
Rae (oere ph gts Dar 
and = 
-l, 
n 710g 2 tees 1 
p,(m) = E 
n * log n LP i= 2 


The respective statements of (10) follow from (3), (7) and 
(5) respectively and those of (11) by (4) and (6) respectively 
when they are also combined with appropriate laws of iterated 
logarithm. From (11), in turn, not only can we deduce that (9) is 


2 
true, but also a law of iterated logarithm for Wa d from that 
b 


of f d KG nidy. For a proof of (10) and (11) we refer to that 
X 


of Corollary 1 in Csorg¥ (1979). 


In addition to (9), from Theorem A we can also prove rates 
of convergence results for this convergence in distribution. Let 


V (x) be the distribution function of we of (8) and let 


n,d »d 


V9) be that of Ws of (9). Then (9) reads 


2 
i = = i = 2 ‘ 
lim PiWe a x} Li va V4 Os)» d va (CED 
n->co nooo 
Put A = sup |v (x) - V (x) |. Then we have 
n,d Qescaes n,d d 
ee ws Ste 
Theorem B. (Gotze, 1979). A O(n ) for any Je->50): 
This theorem is the best available such result for A 1 so 
ee: | Ny 
far. Earlier S. Csorg¥ (1976) showed that A is O(n 210g n) 
’ 
and, on the basis of his complete asymptotic expansion. for the 
Laplace transform of we » he conjectured that A is of 
Det n/t 


order 1/n. This conjecture was further studied by S. Csorg8 and 
L. Stachd (1980). 
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As to higher dimensions d 2, nothing is known about the 
exact distribution function V d (cf. (12)), and only the 
Ns 
characteristic function of Va (cf. (12).) is known (cf. Dugue, 
1969; Durbin, 1970), and that (cf. Anderson and Darling, 1952; 


Rosenblatt, 1952) we 


q may be written in the form 


We ) he Xpsad 34, (13) 


where the x. are independent standard normal random variables 


and the H, are the eigenvalues of the integral equation 
Ja E{B(x, )B(x,) }£ (x,)dx, = uE (x, ) 


with eigenfunctions f and kernel EB (x, )B(x,) (CEA 2.) c 
Whence, in order to tabulate Va (d 2 2), ome may try working 


with a numerical inversion of the characteristic function of Vae 


or one may try to calculate a number of the necessary eigenyalues 
for (13). Unfortunate!y both ways turn out to be quite difficult 
to follow directly. Durbin (1970) succeeded in solving the latter 
problem for d= 2, and Krivyakova, Martynov and Tyurin (1977) 
POG ile = 3c 


Using the characteristic function of Duge (1969), in Cotterill 
and Csorg¥ (1979) we obtatn a recurstve equation in the cumulants 

2 
of Wa, 
caleulate its critical values for d= 2,3,+++,50 at various 
levels of rejectton probabilittes. These are within 3% of Durbin's 
values for d= 2 and those of Krivyakova, Martynov and Tyruin 
for d= 3. As far as we know there exist no other tables for 
d 24. 


and then use the Cornish-Fisher asymptotte expansion to 


Since nothing is known about the exact distribution function 


V d for d?2, it is desirable to have a Theorem B type 
Nn, 
result also for A d when d22. As to the latter we have 
» 


Theorem C (Cotterill and Csorgt, 1979). 


A Sadr. : 
O(n * log n) Ac d= 2, 
a (14) 
Ae, sis 
[cb ubaaad «5 eee nek Ca ee 


as Ly 
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As far as we know, the rates of (14) are the only available 
ones for A d (d 2 2) and these combined with Theorem B tell 
n 


> 
the whole story as presently known for d 21. 


The proof of Theorem C is based on the respective statements 
of (7) and (3) and on the following two lemmas of Cotterill and 
CsorgS (1979). 


2 2 
Lemma A. The distribution function Va of W (d-2 8) “Is 


d 
arbitrary many times differentiable and for an aribtrary integer 


D; vi?) (x) ~~ OUe aS otk ee 


Lemna B. For any real p 3 0 and integers q = 0,1,2,°°* the 
function Pry (x) (d 2 1) is bounded on (0, &~). 


Remark. For d= 1, Lemmas A and B were already known (cf. 
Lemmas 1 and 8 in S. Csorg¥ (1976)), and we had to prove them 


only for d22. Also a stands for the ath derivative of 


and v‘*’ stands for the ath derivative of the density 


function Vy of Ws (Ga lig Po Ig 


2. ON THE LIMITING DISTRIBUTION OF THE HOEFFDING-BLUM-KIEFER- 
ROSENBLATT INDEPENDENCE CRITERION 

Let again F be the class of continuous distribution func- 

tions on d-dimensional Euclidean space aah d 22, and Fo be 


the subclass consisting of every member of F which is a product 
of its associated one-dimensional marginal distribution functions. 
Let XjpottyX be independent random d-vectors with common 


unknown distribution F ¢€ F , and suppose it is desired to test 
the null hypothesis 


Ho? F eB, against the alternative H,: FefF- Fo: 15) 
Let again FA be the empirical. distribution function of 
d =i]: 


Xpott5Xs BAC Y SR eV eid (X)5°*+5x4) 5, Re, FO) is n 
times the number of a n ea taal a j=" oes ane Oe or 


whose components are less than or equal to the corresponding 
components of x, conveniently written as in (1). Let F , be 
ni 


the marginal empirical distribution function of the ith component 
of Xs Te. 5 
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n 


Paes! 2 
BF 9 Ao) ibs oy Beene pg the deb a, 
and define 
d 
T = TOs 9 0554) = F&®) - nat Fut» Gl Sy 20 


Hoeffding (1948), when d= 2, and Blum, Kiefer and 
Rosenblatt (1961), when d 2 2, studied the problem of testing 
Hy vs. HL via appropriate Cramér-von Mises type functionals of 


lL 
their multivariate empirical process n°T (x), obtaining the 


characteristic functions of the limiting distributions of these 
functionals and providing tables for the corresponding distribu- 
tion functions tm the btvartate case. Strong invariance principles 


for the process nr (x) were proved by CsorgU (1979). In order 
to summarize these asymptotic results and for the sake of 
stating further ones, we need some further notation. 

Let sf F, (x,;) be the ith marginal distribution function 
of F and let FC) be its inverse. Define the mapping 


Taal <cie Ro* by 


Lo (yy ott 9¥q) = (FG po Fy Wy) 


d 


y= (yy sttsyg) ET (a2 2). 


For our purposes it will be convenient sometimes to view Th in 


terms of the latter, and we let 


t(y) = TL (y)) = FF Oy) os Fq Oy)? 
d 


=-] 
ater CE ty.) lx (16) 
gph hal 
i=l : 4 


Then H.: F e€ F. is equivalent to H,: FIL (y)) = IT y, = Ay), 
0 0 0 i=l 2 


i.e., under Ho» Th is distribution free, so we may take F 
d 
to be the uniform distribution on I (d 3.2) and can then 


i rms of t.. 
study Th ajuk mete x 


d 
Define the sequence of Gaussian processes (r™ cy): Wisal ‘} 
by 
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d 
AB AGE) j Bi (Ls 1) «Ty 8 yp aah orien tell I 
i=1 441 * ae 


where {B_(y): ye ca is a sequence of Brownian APES 5 ee 


for en, BO is a separable Gaussian process sigs EBL (x) = 
and EB (x)B (Cy) = A(xAy) - A(x)A(y) (jy € 14 ) (cf. D2). 


d 
Define the Gaussian process {T(y,t): yeI, t 2 0} by 


d 

{K(y,t) SS ) K(1,y,,1,t) II Yoo te Vee 1" (d 2 2a Eos 0},~ 
isl ~ 4 j#i J 

(18) 


where {K(y,t): ye x? t > O} is a Kiefer process, i.e., a 
separable Gaussian process with EK(x,t) = 0 and 

EK(x,s)K(y,t) = (sANt){A(xAy) - ACx)A(y)} (0 < s,t < ~, x,y el ay 
(cf. D3) % 


Obviously et™ (yy = ET(y,t) = 0, and simple but somewhat 
tedious calculations yield the covariance sia Sp 
d d 


(n) (n) 
ET (x)T (yj°= HO tx Ays) 4 Gel: iy ; (x, ot 7 i x ye 
i=], SE 1 oY PORE, Vjgi 34 
= o(x, yi, for alt=, (19) 
and ET(x,s)T(y,t) = (s_ t)p(x,y) , 


d 
where x = (x) 5° Xq)> y= Coes aaa Gand. saat ele 


As to ti = T (L(y), the following results are known 
(cf. Theoresm 3 and 4 in CsorgJ, 1979). 


Theorem D. (Csorg§, 1979). Let Xpo1tt Xs n= 1,2,45>. -be 
independent random d-vectors with a common distribution function 
hee FO and let t& be as in (16). Then one can construct a 


probability space (2, A, P) with {t (y): ye 7 Cd Sh 2 
n= 1,2,-+*:}, a sequence of Gaussian processes cr™ (yy; ye ay 
(d 2 2)}, defined as in (17), and a Gaussian process 


d 
{T(y,t): yel (43 2), t¢ 2 0}, defined aacin (ic), cnuieeeo 
that: 
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(i) If d= 2, then for any z and with [6.5 e/D)-, oul 
positive absolute constants 


t% os 
P{sup . [nt_(y) - t™ (yy | >n “(C log n+z)log n} < De “7, 
Vice: 


(20) 
h 2S. +; 2 
whence’ sup eh? Cs (y) - r! re i O(n “Tog ny): 
yet 
(ii) 1f | d- 33, “then forvanyiep > 0,5 there exists a 
C > 0 such that 
ie ae di = 
P{sup gin t,(y) ~ 1°) (yy | > C(log ea party ree aan 
ye I 
(21) 
whence 
=i 
Ls Sie —45(d+1 Bia 
sup g|n®e_(y)- T™ yy] 925° oc 24) “Gog n)3/%), (22 
yel 
and also 
d+ + 
sup sup gl Kt, (y) - Tey, k) | FeSO (ny (d+1)/2(¢ ge They et aes 


It follows from (17) and (18) that for each n 


Pi RUE ye ra (24) 
{T(y,1): y © TS}. 


(r™ vy); ye i) 


is 


d 
Define the Gaussian process {T(y): ye I (d 2 2)} via 
(18) by 


te ie ae os (is 2 gee Tin tle 9 ¢€ lt. (Acs. 2). & oly. 
2 tBCy) - ) B(l,y,,1) MW y.i y= CY 593° * Vag? am taal (nea 8 
i=1 j#i J 
(25) 


where {B(y): y € et is a Brownian bridge. Thus T(-) has 
mean zero and the covariance function (x,y) of (19). 


By (22), or (23) combined with (24) and (25) we get that 
under Ho 


~, : - : 1 
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n(n%e (-)) So n(re-)) 
for every continuous functional h on the space of real valued 


functions on ‘ endowed with the supremum topology. 


For example, given F ¢€ Foe we get 


pS 
A a n? sup alt, | eae Ay = sup glT@I, abiess AAR (26) 
a yel yel 
and 
fret 
Cc =i T (x) Ho OP.) 
n,d Ro n at i 3h 
2 D 2 
= —— = eee 22 
n f A t (yy > C, f d T (yj)dy, d (27) 
ib i 
d 
with, dye. I dy,- 
i=1 
As to how near the functionals A and” <A.  TeSpica 1G 
n,d d n,d 
and or » are to each other, we refer to (4.47), (4.48), resp. to 


(4.49), (4,50), of Corollary 1 in Csorg8 (1979), which, in turn, 
also prove (26), resp. (27), directly and they themselves are : 
also a direct consequence of Theorem D. 


The calculation of the distribution of the Kolmogorov-Smirnov 


type rv Ay seems very difficult. Indeed, one does not even 


know the distribution of sup lB) | nor that of 

ROE d 
sup glWy) | » where {B(y): ye I} = {Wy) - A(y)W(1): ye i} 

yel 
is a Brownian bridge and .iW(y): ye ia is a Wiener process. For 
sharp inequalities for the distribution of sup qv oy) we refer 
yel 
to Goodman (1976). It was falsely claimed by Zincenko (1975) 
that, just like in the case of d=1, we have 
P{sup ay) > a} = 2P{W(1) > a} also for d 22. This created 
yel Z 


a bit of confusion as to the relevancy of known résults (cf. 
MR52#1912, MR54##11532, 11533). 


The calculation of the distribution of the rv Cy is also 


not easy. For d= 2, Blum, Kiefer and Rosenblatt (1961) 


obtained the characteristic function of C, and tabluated its 
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distribution via numerical inversion of its characteristic func- 
tion. In order to enable ourselves to tablulate approximate cri- 


tical values of the r.v. Co d for d22 we (Cotterill and 
> 


Csorg¥, 1980) find the first five cumulants and then use the 
Cornish-Fisher asymptotic expansion. Details as to how to calcu- 
late approximate critical values for all d 22 and tables for 
the "usual" Levels of significance for d= 2 to 20 are given 
tn Cotterill and Csdrg8 (1980). 


Let ie 5 be the distribution function of the rv Co 


and let PG) be that of the rv C,. Then (27) reads 


»d 


d 
lim P{c <ox}e= dim T° -(x) 
zSds n,d aes n,d 
= r{C, < x} = Ps), ee (28) 


Since, for various values of n, nothing is known about 
the exact distribution function ro a? in addition to (28) it 
> 


is also of interest to estimate the distance 


Vv IT. (x) - T4)|, d 3.23 (29) 


n,d ~ SUP Og x<o0 n,d 


The statistic Co 4 of (27) cannot be used to test the 


> 
hypothesis Hy of (15) unless F ¢€ Fo is completely specified. 


In some other situations Ce d might come in handy of course. 
> 


Hoeffding (1948), and Blum, Kiefer and Rosenblatt (1961) suggested, 
as critical region for Ho» large values of 


i] 


x 2 
> 
hag tag Ja T(x) dF (x) , d > 2, 


or those of 


d 
~ 2 
aeae 
Coat? fair) ‘T dF), 4 (30) 
R i=1 
These two statistics are equivalent in that both converge in 
distribution to the rv Cy of (27). 


Recently DeWet (1979) studied a version of (30) in the case 
of d= 2 with some nonnegative weight functions multiplying the 


4 ~ 
integrand Th of Bia d b 


Concerning rates of convergence for < d (d > 2), we have 
; E 
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Theorem FE (Cotterill and Csorg§, 1980). For the distance i 
of (29) we have 


sd 


/2 


“is (dtl). 2. Kipe. Bois teuede aie ae 


O(n 


n,d at os 4 
O(a? log ye de BF (31) 


As far as we know the rates of (31) for i q 2re the only 


available ones so far. 


The proof of Theorem E is based on the respective statements 
of (20) and (21) and on the following two lemmas of Cotterill and 
Csorg¥ (1980). 


Lemma C. The distribution function rs of the rv C, (d 3°2) 


is arbitrary many times differentiable and for an arbitrary 


integer p, oe +>.0 as) Sx 0, 


Lemma D. For any real p 20 and integers q = 0,1,2,°°* the 
function Py (8) (x) (d > 2) is bounded on (0, —). 

(a) 
tf 


Here stands for the ath derivative of Pa and <2) 


d 
for that of the density function Yq of C. (di aso ye. 
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COMPLETE INDEPENDENCE IN THE MULTIVARIATE 
NORMAL DISTRIBUTION 


GOVIND S. MUDHOLKAR 
University of Rochester 
PERLA SUBBAIAH 

Oakland University 


SUMMARY. Testing complete independence is one of the simplest 
problems concerning the covariance structure of a set of measure- 
ments. A stepwise procedure proposed by Roy and Bargmann (1958) 
and a trace criterion -due to Nagao (1973) are two well-known 
competitors of the likelihood ratio test of the hypothesis derived 
assuming the multivariate normality. We consider some modifications 
of the Roy-Bargmann procedure based on combinations of independent 
tests and find them to be asymptotically equivalent to the likeli- 
hood ratio test, which is optimal in terms of the exact slopes. 
The operating characteristics of various tests with samples of 
moderate size are examined empirically. 


KEY WORDS. Combination of tests, exact slopes, stepdown procedure. 


1. INTRODUCTION AND SUMMARY 


Let X1 X50 °K be a random sample from a p-variate normal 


population with covariance matrix %. One of the simplest 


problems concerning the covariance structure of the multivariate 
normal distribution is of testing the complete independence of 
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the p measurements comprising the vectors X,- The likelihood 


ratio test for the complete independence which depends upon the 
determinant |R| of the correlation matrix was derived by Wilks 


(1935). The exact distribution of the likelihood ratio statistic 
is discussed and tabulated by Mathai and Katiyar (1979). An 
alternative solution termed a step-down procedure, which consists 
of p-l independent tests was proposed by Roy and Bargmann 
(1958). This procedure, unlike the likelihood ratio test, per- 
mits post-hoc analysis of the nature of dependence in case of a 
rejection of the null hypothesis and depends upon only the well 
tabulated F-distribution. for its implementation. 


In this paper we introduce a class of tests asymptotically 
equivalent, in terms of the exact Bahadur slopes, to the likeli- 
hood ratio test which is optimal in this sense. The presently 
available methods of testing complete independence are summarized 
in Section 2. The new tests are introduced and shown to be 
Bahadur-optimal in Section 3. Section 4 contains a Monte Carlo 
comparison of these tests with the likelihood ratio test and the 
step-down procedure when the samples are of moderate size. The 
smpirical study also includes a test proposed by Nagao (1973). 


2. SOME TESTS OF COMPLETE INDEPENDENCE 


Let R_ be the correlation matrix of a sample of size N 


from the Nyhus population. The likelihood ratio test for 


+50) = J rejects it if 


> 


where the critical constant ec may be obtained from Mathai and 
Katiyar (1979), or obtained by using approximations such as Box's, 
and Bartlett's discussed by Mudholkar, Trivedi, and Lin (1980). 
Nagao (1973) noted that asumptotically -2 log A isa 


2 ; 

xX -variable when the null hypothesis is true but after suitable 
normalizing it is an asymptotic Gaussian variable for any fixed 
alternative. He suggested regarding r = tr(z Re - ro which 
is proportional to the variance of this normal distribution as a 
noncentrality parameter, i.e., a measure of departure from the 
null hypothesis, and proposed a consistent estimator 

N-1 -1 2 ty Da 


Dim Te kEtSB yolk) ane See 


2 bet er(R” — 1) 
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2 
of a multiple of T as a test statistic for Ho. He obtained the 


asymptotic expansion for T in the form 


1 
PrCr 2k) = peek 
EE Sf ered ¢ “i netwtter Preset 2¢P crunth 


-2 
ay Peyn + aD Pp] + OM’), (@5) 
where f = p(p-1)/2 and Pp = PIXe < x] and 
3 2 
ag = (p~ - 3p’ + 2p)/12, a, = (-2p2 + anf - p)/8, and 
3 
ay = (p- - p)/4. He, showed that it is satisfactory for n = 100. 


Roy and Bargmann (1958) consider the null hypothesis of 
complete independence in the form 


r. 2: 2 
|g : = 
0 al {Hoi reo ek) Ye od aciekta15 


the multiple correlation between X, and (Xp Xoo eX, p> and 


0}, where is 


note that the sample step-down multiple correlation coefficients 
2 
61-19... (1-1) 


They propose rejecting the null hypothesis when at least one of 


are independently distributed when 2 is diagonal. 


the component hypotheses H is rejected by the usual test for 


2 oi 
cts, 2 iss Cer, when Rin ORES 1) 


simple to implement as independently distributed 


2 a constant. This procedure is 


; d 2 2 
have variance ratio distributions with (i-l, N-i) d.f., 
i = 2,3,°**,p. However, the procedure does require an a priori 
ordering among the measured variables and a decision regarding 


the levels O, of the component tests which, because of the 


D225. (i—1) 


independence, are related to allowable overall type I error a 
P Wipe 

by (l-a) = I (1-a,,). It is common to take a, = 1-(1-a) /(p-1) 
i=2 

i = 2,++*+,p. Roy and Bargmann gave the confidence bounds asso- 

ciated with this step-down procedure which can be used to gain an 

understanding of the nature of dependence in case Ho is rejected. 


3. A CLASS OF B-OPTIMAL TESTS 


The problem of the allocation of the overall type I error 
among the components tests of the stepdown procedure may be 


160 G. S. MUDHOLKAR AND P. SUBBAIAH 


avoided by considering, instead of the variance ratio statistics 


Fay the P-values P associated with the individual tests 

i 

i= 1,--:,k, where k=p-l. Since the statistics Fad are 
independent under Ho» the P-values Ps have independent uniform 


null distributions. These can therefore be combined variously 
to construct an overall test for Ho: The problem of combining 


independent tests of significance is classical and the literature 
on the subject is extensive. It is well reviewed in Liptak (1958), 
Oosterhoff (1969), George (1977) and Mudholkar and George (1979). 


A combination procedure for the P-values Pi PootttsPy 


associated with k independent tests of significance for hypo- 


° = . > i= eee if 
theses Ho: o5 Oot Wish HL: 0; Ow i 15-2, >k is based 


upon a combination statistic ERs Ey which is used for 
testing the overall hypothesis Hy = No; vs. the alternative 


H, = UH: The overall null hypothesis Hy 


¥(Pj5***>P,) is large. The following are some of the well known 


is rejected when 


combination statistics: (i) The earliest proposed 
Yo = min {-2 log P,} due to Tippett; (ii) Le = Y-2 log PS due 


to Fisher; (iii) YN = ) o 7 (1-P,), @ being the c.d.f. of 
1€ 


standard normal, considered by Liptak (1958); and 


civ) + 2 ) log [P,/(-P,)] introduced by George (1977). These 


statistics have simple null distributions. Ye is distributed 


as the smallest order statistic of a sample from the exponential 
: ; 2 ; 

population, Ye issa Xo, variable, a has N(0O,k) distribu- 

tion, and Fh a k-fold convolution of logistic distribution is 


approximately a t variable with 5k+4 degrees of freedom. 
It is easily seen that the stepdown procedure with equal a's 


is equivalent to the Tippett combination of its P-values. An 
account of various studies of the operating characteristics of 
combination methods in the Neyman-Pearson and decision theoretic 
framework may be found in Oosterhoff (1969). However, none of 
the methods can be preferred on the basis of these works. 

Littel and Folks (1971) examined Bahadur ARE's of various methods 
and found that among all monotone combination procedures Fisher's 
is optimal according to this criterion. Mudholkar and George 


(1979) showed that te has the same exact slope as Ye and is 


consequently optimal. For a recent account of this aspect see 


COMPLETE INDEPENDENCE ‘161 


Berk and Cohen (1979). These studies of asymptotic relative 
efficiencies concern combinations of independent tests; but can 
be extended to the methods of combining tests which are indepen- 
dent under the null hypothesis only. 


The exact slope used in defining Bahadur ARE of a test at 
an alternative is the rate at which -2 log (P-value of the 
test) increases with respect to the sample size n, when the 
alternative is true. Specifically, let large values of a statis- 
tic T) be significant in testing Hy? G€ 0,4 vs. Hy: 8£ 0p, 
Fo nit) denote the distribution function of Ts and 

> 
P(T)=1-F (T_) be the associated P-value. Then 
nen Oe angi 
c(8) = Wimleon log P_ (T_)}, when it exists, is the exact 
ai nerD 

slope of Th c(6) is often obtained using the following result 


_due to Bahadur (1971, p. 27). 
Proposition. Suppose that lim Tt Yn = b(0) a.s. for each 


0 


lim op (t) = p(t) exists and is continuous on an open interval 
neo 7 
containing the range of b(@). Then the exact slope of T, is 


c(8) = 2:p(b(8)). 


660-0... Let p(t) = -n" log[1-F, ne t)] and suppose that 


Remark. o(t) is sometimes referred as the index of the. sequence 
{tT } or of the sequence of distributions {F, ea of {T }. 
b 


Now consider the present problem of testing the null hypo- 


thesis Hy that the covariance matrix 2 is diagonal. The 


step-down procedure which involves testing the component hypothe- 


sis Hoi? Pi4.-12..04 = 0 with 


(2) 


Fo. = [(N-i-1)/i] + [R? ae as 


i+] i+1-12...4/ 1-8 


i = 1,2,°+:,k, may be modified by combining these independent 
(only under Ho) tests using a combination statistic 


"o*5PL), where P,'s are the P-values associated with 


‘12 
Pius. 8e We are interested in the statistics of the form 
i 
¥(P,>°°" PL) = ) >, (P,); where o; (P,) = o,0, are monotone 
i 


decreasing with index p(t) = t, i= 1,2,+++,k. Let L denote 
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=l 
the family of these tests. Note that if b; = G, (1-P,) then 
under Ho» p(P,>*** PL) is distributed as the convolution of 


**°,G In fact L includes in this manner the combina- 


G)> CG,» k 
tions of the step-down tests based on Fisher's method with 
Goria od = -2 log t and on logit method with 
@t(t) = loglt/(1-t)]. 

The tests in the family L are asymptotically optimal and 
equivalent to the likelihood ratio test for Ho: In order to 


demonstrate this, i.e., to obtain the exact slope of 

WP, °++ PL) = 2o(P 5), we examine >, (P;) which has the same 
slope as Fig given in (2). Since Fad = a variance-ratio 
wiitho = (48 eure d.f., for an alternative P54a-12...4? 
{F,/(N-i-1)} converges in probability to 


2 2 paiva : ’ 
{Tes 1 5e 8. Fe Pah gee gee . This may be rigorously 


proved using the fact that i/(n-itl) F,,, is equivalent in law 


2 2 : 2 
to Xp tilZ FP yaa -22 vent Pae-a2.s.desenen? /Snesaa (ay MOSEE 


Z, N(0,1), and the chi-square variables are mutually independent. 
Moreover, it can be shown (e.g., see Bahadur, 1971, p. 13) that 


all 
—n log(1-F) A nt)) > log (l+i tk Hence by the above pro- 
3 
position, the exact slope of Fix 


o; (P,) is 


1 or its monotone function 


2. 


"4 2 
i+1-12...42 ~ Tlos(i-p 


rats 1p Bh lide nied, 


). 


Now, in view of the results by Berk and Cohen (1979) it follows 
that the index of Xo, (P,) is the same as the index p, (t) =t 


of each o,(@,) and consequently the exact slope of W(P 92° SsPL) 


is 


i] 


) - log (1-p7 


P 
c(P) i41+12...1 


) 


-log |Pl, (3) 


where |P| denotes the determinant of the population correlation 


matrix P s (P55): By particularizing the result in Section 3.4 
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of Hsieh (1979) it is seen that the likelihood ratio test for 
complete independence is asymptotically optimal with the exact 
slope -log |P|, the same as (3). 


4, AN EMPIRICAL EVALUATION 


In this section we present a Monte Carlo study of the 
operating characteristics of some of the asymptotic Bahadur 
equivalents, of the likelihood ratio test for complete indepen- 
dence, described in Section 3, when the samples are of moderate 
size. The study also includes Nagao's test given in Section 2. 
The finite sample behavior of the tests are investigated in 
terms of the power function as well as in terms of the means and 
s.d.'s of the P-values of the tests at various alternatives. 


4.1 The Monte Carlo Experiment. The simulation study was con- 
ducted on IBM 3032 at the University of Rochester, generating the 
random samples from IMSL routine GGNRM. .3000 samples of size 

n= 20 and n= 30 were drawn from_ NO» x) with 


of = 05 Heres oF = 1 and various configurations of correlations 


O,./efrom values) 0, .2,7.4, <6 and°i8* forj¢p = 3,4,5.\¥For each 
1j 


sample drawn, the following test statistics were obtained from 
the sample correlation matrix R= (rie, 28 


(i) Likelihood ratio based statistic 2=1- {N-1-(2p+5) /6} 
log [R|, 


F 2 
(ii) Nagao's test statistic T = (N-1) ) ne iur; 
1j 
i<j 
(iii) Step-down statistics Fi = [W-i)/(i-1)]- 


2 2 3 is 
[R /(1-R, | q-p!> for i = 2535 Pp 


Le t2eus (ins) WAR ee 


(iv) Combination statistic based on Logi method 
vy, Be be log(P,/(1-P,)), 


(v) Combination statistic based on Fisher's method 
P 
w= -2 ) log P,, where P. are the P-values 
i=2 ; 
corresponding to the step down statistic. 


L 


These test statistics were compared with their critical constants 
determined using the following facts regarding the null distri- 
butions: 
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(i) & is well approximated with 7 distribution having 


(ii) the critical constant Ty for T may be approximated 


by 
1 2a,u 2 4)} 
T, 2 u+— (eceppyceeay (Ut Eta + (E42) (E44) 


2a, .u 2a,u 
aE £ (£42) (utf+2) + erie ’ 


where u is the upper 1000 percentage point of the 


ve distribution with d.f. f = p(p-1)/2, and ays 
a,, 46 are given in (1); 
(atstah), Fy is distributed as a variance ratio F with 


(i-1,N-i) d.f., for i = 2,3; 


(iv) v, is approximated with act» a constant times 


1 
student's t, where a=T7 {k(5k+2)/(3(5k+4) } i? 


k = p-l, and d.f. v = 5k+4; 


(v) be is distributed as a ve with 2(p-1) d.f. 
The power of each test was estimated by the proportion of 
times the null hypothesis was rejected by the corresponding test. 


The s.d. of any of these estimates < {3000 NV iaae . The P-values 


corresponding to the tests were obtained using the equation (1) 
and the results on the null distributions as mentioned above. 

The P-values in each case were averaged and their standard devia- 
tion was computed. 


The estimated power functions and the means of the P-values 
of the five tests at various alternatives are given in Table 1 
and Table 2 respectively. The Monte Carlo experiment with 3000 
simulation was first conducted with n= 20 and p = 3 for the 
correlation configurations appearing in the tables. After an ex- 
amination of the results it was performed with p=4 and 5 for two 
special configurations, namely (i) the extreme configuration in 
which only the first correlation configuration Poy is non-zero, 


and (ii) the symmetrical configuration where all correlation co- 
efficients are equal. As a confirmation of the findings, the pro- 
cedure was repeated with n=30. The s.d's of the P-values and 
the results for n=30 which are not included in this paper are 
available from the authors. 
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TABLE 1 


: The empirical power functions for samples of size n=20 with 


Monte Carlo of stze 3000. 


All 


All 


Nonzero 


Corr. 


mm 
°o 


ate le 


DvD 
i 


aanfh 


Onfn 


Onrh 


L.R. 
Test 


- 0490 


- 0880 
+2673 
+6503 
+9823 


+0963 
-2710 
-6733 
- 9800 


-0990 
- 2713 
-6570 
9717 


- 1393 
-5983 
-9990 


+1387 
+5967 
-9980 


-1583 
- 5843 
- 9987 


.1890 
6340 
9490 
1.000 


+0473 


- 0763 
- 2013 
Bey 47h 
«9400 


+2253 
-7270 
- 9833 
1.000 


-0470 


- 0670 
-1520 
- 3923 
-8727 


.2773 
+8223 
- 9897 
1.000 


Nagao 
Test 


- 0480 


- 0900 
- 2650 
-6357 
-9780 


-0957 
- 2663 
- 6493 
-9770 


-0967 
- 2660 
- 6430 
-9650 


-1417 
-5400 
+9930 


-1357 
-5403 
- 9947 


-1550 
-5410 
-9940 


- 2150 
- 6847 
+9650 
-9997 


-0457 


- 0803 
. 1937 
- 4837 
-8993 


- 2850 
-8153 
9923 
1.000 


- 0463 


-0677 
-1493 
- 3607 
-7377 


. 3820 
- 9080 
-9963 
1.000 


Step Dn 
Test 


-0527 


- 0993 
- 3300 
-7540 
-9930 


- 0867 
«2420 
- 6470 
-9757 


+0930 
- 2373 
-6273 
-9707 


-1320 
-5270 
- 9933 


-1187 
- 5663 
9977 


-1543 
+5203 
«9923 


-1693 
- 5353 
-9073 
+9993 


-0457 


-0960 
- 2817 
-7150 
-9840 


- 1680 
+5823 
«9427 
1.000 


+0447 


- 0897 
-2517 
-6760 
- 9840 


.1887 
-6150 
- 9537 
1.000 


Logit 
Comb. 


- 0503 


-1003 
-2957 
-6437 
-9717 


- 0893 
+2197 
+5347 
- 9390 


-0877 
- 2280 
-5393 
«9210 


-1420 
- 6200 
+9977 


~L 173 
-4767 
- 9817 


- 1660 
-6050 
+9980 


i933 
- 6453 
-9567 
w9997 


- 0480 


+0953 
- 2350 
+5530 
-9167 


- 2283 
-7440 
- 9860 
1.000 


0527 


- 0817 
- 2043 
-4573 
- 8673 


- 2867 
- 8347 
+9903 
1.000 


Fisher 
Comb. 


-0473 


.1027 
3227 
-7217 
-9877 


. 0883 
. 2347 
.6107 
.9717 


- 0890 
- 2350 
-5953 
-9587 


- 1403 
-6170 
-9990 


-1220 
-5407 
-9973 


-1673 
+5987 
+9987 


~ 1937 
+6353 
-9527 
~ 9997 


-0463 


-1020 
-2707 
- 6643 
9727 


«2257 
- 7343 
- 9837 
1.000 


.0473 


- 0857 
- 2313 
+5823 
9597 


«2763 
-8220 
- 9897 
1.000 
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TABLE 2: Estimated means of the P-values at various alternatives for 
samples of stze 20 with Monte Carlo of stze 3000. 


Nonzero LR. Nagao Step Dn Logit Fisher 
Corr. Test Test Test Comb. Comb. e 


H 
= -4345 +4340 «4250 -4278 ~4239 
7 = +2482 - 2485 - 2201 - 2418 » 2213 
= 0775 -0805 +0555 - 0863 -0610 
= -0448 - 0062 0023 - 0069 - 0033 
= -4229 -4215 -4314 - 4339 - 4302 
= ~ 2544 -2553 +2750 - 2926 +2749 
= -0793 - 0823 - 0900 -1234 - 0968 
= - 0048 - 0061 +0052 -0155 -0070 
= «4221 +4216 +4334 - 4330 -4310 
= -2531 +2543 «2739 - 2916 «2736 
= - 0834 - 0832 .0919 -1248 -0975 
= - 0059 - 0076 -0065 -0170 - 0087 
= -3659 - 3658 - 3656 - 3708 - 3610 
= - 0984 -1081 -LIU7. - 1000 -0950 
= -0007 -0035 - 0028 - 0008 - 0007 
= - 3627 +3641 »3791 - 3864 «3/73 
= - 1009 -1102 -1141 -1459 -1190 
= -0007 - 0034 - 0009 - 0069 - 0012 
= - 3606 -3613 - 3607 - 3637 «3562 
= -1011 -1101 -1134 -1018 -0976 
= - 0007 - 0034 -0027 - 0008 - 0006 
= « 3217, - 3152 - 3340 - 3250 -3237 
= -0985 - 0871 -1183 -0970 -0978 
= -0115 - 0084 -0192 -0112 -0114 
= -0001 - 0000 - 0003 -0001 -0001 
4 - 5030 +5017 - 5037 -5064 +5047 
«4509 +4491 «4291 +4405 +4312 
+3090 +3105 +2523 #2929 - 2586 
+1202 -1280 - 0689 -1191 - 0812 
-0124 -0204 -0031 -0177 - 0057 
- 2876 «2705 «3145 ~2910 - 2904 
- 0660 - 0484 - 0981 - 0643 -0653 
- 0040 -0018 -0125 - 0034 - 0039 
- 0000 - 0000 -0002 - 0000 -0000 
5 -5025 +5047 +5012 -5035 -5029 
= - 4640 +4643 - 4400 ~4483 -4397 
= «3353 +3346 - 2626 +3056 +2724 
= -1658 -1769 - 0838 -1534 - 1008 
= -0251 -0451 -0041 -0288 - 0089 
= - 2639 +2334 +3042 2674 - 2668 
= - 0407 - 0236 - 0832 - 0393 -0414 
= - 0023 - 0008 -0099 -0018 - 0020 


- 0000 - 0000 -0002 - 0000 - 0000 
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4.2 Conelustons. Two features of the comparative behavior 
of the five tests clearly emerge from the two tables: (i) In 
the case of the extreme comfiguration, with Poy # 0, Pas = 0 


otherwise, the step down procedure is preferable. Its superiority 
over the other four tests increases as p increases. Nagao's 
test is the poorest in this case. (ii) Nagao's test dominates 
others if Hy is violated in a symmetric manner, i.e., when 

Ps5 8 are nonzero and equal. The stepdown test is the weakest 
in this case. It is also observed that the likelihood ratio test, 
and the two combinations of the P-values of the stepdown compon- 
ents are generally comparable and are prefereable except against 
the two special alternatives. In view of the added resolution 

of the information, the authors are inclined to recommend the 
procedures based upon the combinations of the P-values. 
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SUMMARY. In this paper some general results are given for the 
asymptotic relative efficiency of Kendall's tT and Spearman's 
QO under tests for independence. Discussion then turns to the 
bivariate exponential distribution (BVE) of Marshall and Olkin 
(1967) and the absolutely continuous bivariate exponential 
distribution (ACBVE) of Block and Basu (1974). Power calcula- 
tions are carried out for several statistics, and asymptotic 
relative efficiencies are calculated. 


KEY WORDS. tests of independence, bivariate exponential, life 
testing, Kendall's tau, Spearman's rho, Pitman relative 
efficiency, locally most powerful rank test, power study. 


1. INTRODUCTION 


An important topic in the area of statistical inference is 
the testing of independence for a pair or group of random 
variables. Under normality of the variables the results dealing 
with this topic are extensive, both in parametric and nonpara- 
metric cases. Beyond this, few other models have been available 
for similar analyses. Besides the general result of Blum et al., 
(1961), some results have been attained for specially contrived 
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models of dependency, but these models are in general not 
physically appealing. 


An area that is more open for investigation in testing for 
independence is that of distributions of the bivariate exponen- 
tial type, which are more applicable as models in many physical 
problems, for example, in the areas of life testing and reliability 
analysis. Basu and Block (1975) give a good summary of distri- 
butions in this category. 


In this paper we will primarily concern ourselves with the 
models proposed by Marshall and Olkin (1967) and Block and Basu 
(1974) which have appealing physical properties. The purpose 
of this work is the investigation of appropriate tests of inde- 
pendence for these models, both parametric and nonparametric, and 
a comparison of their powers. 


We will primarily study the behavior of two well-known non- 
parametric tests, namely Kendall's Tt and Spearman's p. Farlie 
(1960) showed that these tests are LMP and asymptotically equiva- 
lent for testing Hy? hi = 0 against Hq: Ajo # 0 when the 


underlying model is of the form H(x,y) * F(x)G(y){1+A, ,F(x)G(y) } 


where F=1-F and G=1-G. Note that this class contains the 
well-known bivariate exponential distribution of Gumbel (1960). 
It seemed these tests would perform well for other bivariate 
exponential models as well. 


It is well known that Kendall's tT and Spearman's 0, after 
appropriate linear transformation, are asymptotically equivalent 
under the hypothesis of independence (see Hajek and Sfdak, 1967, 
p- 60). If we consider contiguous alternatives of the form 
ion = o/YWN, it then follows from the definition of contiguity 


that they are also asymptotically equivalent under contiguous 
alternatives. Thus ARE(R,T) = 1, where ARE denotes Pitman 
asymptotic relative efficiency. 


2. TESTS FOR INDEPENDENCE 


The survival function of the bivariate exponential distribu- 
tion (BVE) of Marshall and Olkin (1967) is given by 


F(x,y) = P(X>x, Y>y) = exp[-), x - hoy - yo max(x,y)], 
h 
where Aj Ay?0, 3929, x,y>0. 


F(x) = P(X>x) = exp[-(A, +A, )x1, 
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F(y) = P(Y>y) = exp[-(A,+A,,)y]- 


Note that Ayo = 0 implies X and Y are independent, and 


since 9 = Ayal fone sAa= AY + hy +2 if xX and Y are not 


22 
independent, they must be positively correlated. Thus, a test of 
independence takes the form Ho: Ayo = 0 versus Hy: io Pal oh 
Now P(X=Y) also equals Ayolrs so any time the sample has a 
diagonal element, the null hypothesis should be rejected. Thus, 


for a bivariate sample {(X,.¥))> (Xy5¥5),°°, (XY )I, Ho is 


rejected if X, = ve for any i. ‘Since P(X, =Y,) = Ayal Oy trot, 9) 


= 9, this event has probability 1-1-6)". For this model, 
020, so for a test statistic S , the power function of the 
test is . 


1-(1-p)" + (1-p)” P(S_>s,|X#Y) 


where under H P(S_>s,|x#Y) =a, the significance level of 


QO? 
the test. This is because under Ho» o=0, and the power function 


becomes P(S_>s,|X#Y) . Note that rejecting for diagonal elements 


does not contribute to the type I error. Also note that lathe) 
gives a lower bound on all power functions. 


Bemis et al. (1972) derive the uniformly most powerful (UMP) 
test for the case when Ay and do are known, and Bhattacharyya 


and Johnson (1973) derive the UMP test for the case of identical 
marginals, that is, when Ay = Ao: 


The second bivariate exponential distribution to be considered 
is that of Block and Basu (1974) called the absolutely continuous 
bivariate exponential distribution (ACBVE). It's survival function 
F(x,y) is given by 


r 
r 12 be 
XH, exp[-A, x - hoy - di max(x,y) ] X jth, . exp[-A max(x,y)], 
Oia X59 a Oss Ayr, = 10). dio 20, with marginals: 
A 12 
F(x) = —+—. exp[-(\,+A,,)x] - ~~. exp(-Ax), 
Ay tr, i ges fe) hytr, 
r 


= 2 
F(y) ere We exp[-(A,+A, 5) y] - ie? RE 
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As with the BVE model, A, 70 implies X and Y are independ- 
ent exponential variates with parameters aT and Ay respective- 


ly. It is well known (see Block and Basu, 1974) that the ACBVE 
is the absolutely continuous component of the BVE. An immediate 
result of this which will be used later is that the ACBVE results 
when the BVE is conditioned off the diagonal. Thus when testing 
for independence under the bivariate exponential distribution 

of Marshall-Olkin and Block-Basu the power functions are 


related by: 


n n 
Po = 1-(1-9) + (1-p) PBB 


where Po and PLB are the power functions of the Marshall- 
Olkin and Block-Basu models respectively, n denotes the sample 
size, and 0 the correlation of the random variables under 

the Marshall-Olkin model. 


First consider the case of equal marginals with AL = Ay = 
B and iu2 = 0. Using the distributional results of Bhattacharyya 
and Johnson (1973), it can be seen that the UMP statistic U is 
the quotient of two independent gamma random variables. Thus, 
it is a simple matter to obtain the moments of U. A second 
statistic is based on the MLE 6 of 6 as derived by Mehrotra 
and Michalek (1976). Moments are again easily obtained. By 
straightforward calculation (see Weier and Basu, 1978), one 
obtains the following: 


Theorem 1. For the ACBVE model of Block-Basu when testing for 
independence with the statistics T = Kendall's T, R= 
Spearman's p, U = UMP statistic, and M = maximum likelihood 
statistic, in the identical marginal case ARE(T,U) = ARE(R,U) = 
ARE(T,M) = ARE(R,M) = .5, and ARE (M,U) = 1. Note that the 
efficiency of 1 will also hold for the BVE model due to the 
relationship of the two power functions. 


Using the results of Shirahata (1974), the LMP rank test for 
the ACBVE model can be shown to be 


which, after some simplification,reduces to the statistic 
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For details see Weier and Basu (1978). 


In several instances the statistic giving the LMPRT is of 
the form where the argument, say A(R 2G)» factors into 


A(R) BL (Q)- Examples are the normal scores statistic and 


Spearman's 9. Although the ACBVE model does not give rise to 
such a statistic, the presence of exponentiality would make it 
seem of interest to check the performance of such a statistic 
where A, and BS are the expected values of order statistics 


from an exponential distribution. We then have 


} ae Mell oer 
== A (R,)B_(Q,) = ees Jen 5 a 1/j ] 
eS ee Men k=1 isn-R,-1 j=n-g,-1 


Here S will be called the exponential scores statistic. Suit- 
ability of an exponential score statistic for a life testing 
problem was studied by Basu (1968). All the statistics, except 
L and S, are known to be asymptotically normally distributed. 
We next compare their powers. 


3. POWER COMPARISONS 


Large sample power calculations are given in Table l. 
Recall that the power functions for the BVE and ACBVE models 
Pee a own 2 
are related be Po = 1-(1-9)” + (1-0) PBR where Puo Ppp? is 
the power for the BVE (ACBVE) distribution, and p = Ajo = 


6/(28+6), the correlation coefficient for the BVE distribution. 
The tabled values are the probabilities of rejection when a is 
the significance level, and p indicates the value of the 
correlation coefficient under the alternatives. Recall that 


this is a one-sided test. 2 indicates the power of the statistic 
proposed by Bemis et al. (1972) in the case where aT and Ay 


are known. The values are exact chi-square probabilities, and 
they provide upper bounds on powers for both models. U provides 
exact F-probabilities for the UMP test for Bhattacharyya and 
Johnson (1973). T indicates the normal approximation to the 
distribution of Kendall's tT. r indicates powers obtained using 
the normal approximation to the distribution of the usual 
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product-moment correlation estimator. The convergence to 
normality of this statistic is slow under normal alternatives, 

and the moments used are those obtained under normal assumptions, 
so the reliability of the values obtained here for the exponential 
models is questionable. 


As should be the case, other than the 2 test, the UMP 
statistic has the best performance throughout. r performs 
somewhat better than T, but again the moments and convergence 
to normality are appropriate properties under normality but are 
questionable: in this situation. Subsequent small sample studies 
did tend to bear out this result however. It can be noted 
throughout the table that in the ACBVE case, for T to achieve 
equivalent power with respect to U, the sample size must be 
doubled. For example, consider a=.10 and p=.05; for n=80 
the power for U is .1650 while for n=160 the power for T 
is .1649. This provides evidence supporting the earlier result 
that = ARE(T,U})> = «5. 


For small sample considerations, the null distributions of 
the LMPRT statistic L and the exponential scores statistic S$ 
were generated for n = 4, 5, and 6. *For fixed increasing X 
ranks, all permutations of the Y ranks were generated, and 
values of the statistic were computed in each case. For n3/ 
the magnitude of n! begins to prohibit further generation of 
the null distributions. Selected quantiles of the computed 
distributions are given in Table 2. Note that positive 
correlations is indicated by large values of both statistics 
so independence would be rejected in favor of positive correla- 
tion when values of the statistic exceed the appropriate upper 
quantiles. 


The results of a small sample Monte Carlo study are given in 
Table 3. The tabled values are the number of rejections out of 
10,000 trials for n=6 with a (the significance level) = .10, 
and f=1l. U, L, and S are as used previously. T and R denote 
Kendall's t and Spearman's po respectively. r again represents 
the usual product moment estimate of correlation; re yields. the 


number of rejections obtained when using the .9 quantile of the 
usual t statistic for r which is appropriate for data which 
is normally distributed; and Ts is the number of rejections 


obtained using the .9 quantile of a previous simulation of ACBVE 
data based on 10,000 trials. Exact critical regions of size .10 
were obtained for L, S, T, and R' by randomization on the .9 
quantiles. 


The section of the table labelled ACBVE consists of power 
of the statistics for data generated under the ACBVE model. The 
generation scheme used was that described in Friday and Patil 
(1977), page 543. 


‘ 
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TABLE 2: Null distributions of score statistics. For given 
a, S. and L, are the values such that P(S<S) =a and 


P(LEL,) = Oh 


n= 4 5 6 

Qa Ly Re Ly 23 Ly Sa 

Od 1.4619 2.3055 1.7917 2.6819 2.1440 3.3017 
eS 1.5191 26 55D5 1.8709 3.0986 Je2500 se267, 
A ito! 1.5762 2.6389 1.9306 3.2653 2.3655 4.0517 
-20 i ayisieis! Ze oe 2.1091 3.6819 2.5218 4.5086 
30 1.7905 3.0555 2.2480 4.1680 2.6858 4.9336 
40 1.3702 3.3889 es Byer] 2 4.4597 2.8323 5795 
50 19333 3.6666 2.4464 4.7514 2.9549 By v/on is 
-60 Ze as 3.8889 2259003 5.1889 3.1089 6.1681 
70 2.2905 4.8055 2.8837 5 D430 3.2655 6.6889 
80 2.3619 SWIPE) 2.9424 6.5569 3.4764 Ge 138h5 
90 2.5429 5.3889 3. L210 Tn OZL 2 3.7061 8.3975 
95 2.5762 5.6666 3.2599 7.4041 3.9109 8.8556 
99 2.6714 a= o055 3.4682 7.6055 4.1266 9.3000 


TABLE 3: Small sample power. 


ACBVE U i Sy T R Ee x 
Ss t 
Ayo p 
0.0 OZ0 1025 1026 1002 1043 1050 1049 1248 
0.506 On 1606 1380 1362 1362 1315 1363 1602 
1.414 O72 2413 1895 1860 1788 1774 1966 2246 
21.489 Oo4 4221 2879 2809 DGPS Vd AMS 3128 3526 
Wetbull 
m p 
O75 0.0 6126 993 960 1000 988 798 978 
0.2 8056 1889 1826 1763 1748 1651 2019 
LS 0.0 350 1021 1014 974 988 1158 1322 
OZ 1092 1911 1840 1762 7 7Y 2156 2518 
Normal 
Tos 
0.0 4 981 974 1006 1009 795 1002 
Os5 1935 2247 2182 2247 2286 2106 2518 
Lognormal 
2) 


0.0 1098 957 97377 L002 996 1066 1248 
0.3 Pas eee coer 20es, e209, 2301 ~°2299 + §=2613 
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The values of po in this section refer to the correlation 
coefficient of the ACBVE distribution, not that of the BVE as 


2 
in the large sample study. For the ACBVE, p = (468+36 )/ 
(16674+2086+767) [see Block and Basu, 1974, letting ,=Ar,=8; 


\.,=6]. Note that the maximum value occurs for 6-0, and it 
is°3/7 or, approximately, .4286. The simulation was not done for 
the BVE model due to the direct relationship of the two models' 
power functions. 


To study robustness of the tests, a number of bivariate 
distributions are considered. The Weibull section is simply 
bivariate data from Weibull distributions generated in the manner 
of the ACBVE distribution with correlation p and then raised 
to power m. The normal section is based on bivariate normal 
data with means and variance of one and correlation p. The 


lognormal uses pairs kerne.} where (X,Y) is generated as 
in the normal case. 


Except for TY, the number of rejections under Ho which 


should be 1000 is acceptable in all cases. For the data of the 
ACBVE type, the UMP test certainly performs the best. As is 
expected, R and T perform comparably throughout. The in- 
appropriateness of using the t distribution to obtain the 
critical region for -r can be seen since the type I error is 

too large, thus exaggerating the power throughout. With the 

more appropriate modified critical region, r is second in 
performance to the UMP test. However, in general its distribution 
and thus this critical region would not be known. The two score 
statistics do better than the usual two rank statistics with 

the LMPRT performing better than the exponential scores. However, 
it is disappointing that they do not compare more favorably with 
the UMP test. If this performance carries through to large 

sample power, one would suspect that perhaps the score statistics 
are asymptotically as efficient as the other rank tests and not 
the UMP test. Since ARE(T,U) = ARE(R,U) = .5, we do know that 

-5 < ARE(L,U) < 1, and the results suggest it may be closer to 

-5 than 1, not at all the situation with the bivariate normal 

case concerning the normal scores statistic. 


Since rank tests are invariant under order-preserving 
transformations their performance remains the same in the Weibull 
case as in the ACBVE. Again the scores statistics do better 
than the other two with the LMPRT the best. The parametric 
statistics are quite distorted with respect to type I error, and 
thus their values are of little consequence since they are 
highly non-robust. 
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In the normal case t. is now uniformly most powerful, and 
it performs as such. ) is no longer of importance since 


P(Type I) < .1. Now, as would be expected, R and T perform 
more favorably than the score statistics based on exponential 
assumptions. The same carries through to the lognormal case 

as again the orderings of the data have been preserved. Here r 
computed for the pairs (2n X, &n Y) would be uniformly most 
powerful with numbers of rejections approximately those of r 

in the normal case. r 


It would seem from the information in this section that 
although rank statistics perform well under assumptions of 
normality, the same does not appear to be true under this type 
of exponential model. Rank statistics are quite lacking in 
efficiency when compared to their parametrie counterparts. However, 
note that this means for the rank statistics to achieve the same 
power as the parametric, a much larger sample size is required; 
when one considers a given sample size (see Tables 1 and 3), the 
resulting powers are not as radically different as the efficiency 
results might indicate. Also, one has to be extremely careful 
about choosing the rigkt parametric model since the parametric 
statistics are highly non-robust. Though of lower efficiency, all 
the rank tests are seen to be quite robust as to model variations. 


Additional results for multivariate exponential distributions 
have been obtained (see Weier and Basu, 1981) and will appear 
elsewhere. 


This research has been supported in part under contract 
NO0014-78-C-0655 for the Office of Naval Research. 
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ON TESTS FOR DETECTING CHANGE IN THE MU LTIVARIATE 
MEAN 


M. S. SRIVASTAVA! 


Department of Statistics 
University of Toronto 
Toronto, Canada M5S 1Al 


SUMMARY. We consider tests based on one observation on each of 


N 2 2 random vectors pot Ky to decide if ‘the mean vectors UU. 
of the x,'s are all equal against the alternative that a change 


has occurred at some unknown point r, (i.e., 


= =2 0.0 6 = =ses= 4 ! 
w= uy # Mog Ly) + The x,'S are assumed to be 


normally distributed with common unknown covariance. An estimate 
of the change point r is also given. 


KEY WORDS. detecting a change point, multivariate mean, likeli- 
hood ratio test and estimate, distribution of test statistics. 


1. INTRODUCTION 
A problem of considerable practical interest is the following: 
Given one observation from each of N random vectors port Xp 
how can we decide whether the means of the x,'s can be con- 
sidered to be the same or whether one needs to consider two models 


of the form 


= Leja = ei en Ger < o< N 
x; ute, ee a x, =H E, ¢ ) 


where the ¢€.'s are independent error p-vectors and r is unknown. 
an 
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Apart from obvious potential applications to the detection 
of shifts in production processes, tests of this kind may be 
applied to the detection of impacts of social programs (drugs, 
advertising campaign) since the time at which the effects of the 
program are felt is usually unknown. When p= 1 and the initial 
level (uy) is known, this problem was first considered by Page 


(1955) for one-sided alternatives. Chernoff and Zacks (1964) gave 
a Bayes test which was later generalized by Gardner (1969) to 
two-sided alternatives; the exact distribution along with per- 
centage points were given by Sen and Srivastava (1975a). The 
unknown variance case was considered by Sen and Srivastava (1975b). 
Sen and Srivastava (1975a,b,c) also proposed likelihood ratio 
tests and compared its power with Bayes tests. If there has been 
only one change and the change occurred near the end, the likeli- 
hood ratio tests perform better than the Bayes test. Bhattacharya 
and Johnson (1968), Sen and Srivastava (1975a) and Sen (1971) gave 
some nonparametric tests. However, when p> 1, Sen and 
Srivastava proposed a Bayes test when the covariance matrix 2% 


is of the form % = oan o” unknown. 


In this paper, we propose an estimate of 'r', the change 
point and two tests for detecting the change point in the multi- 
variate mean when the covariance matrix is completely unknown. 
While the percentage points are available for one of the tests, 
we provide tables for the second test up to p = 5 _ by Monte 
Carlo techniques. 


The problem of detecting the change arid the problem of esti- 
mating the change point can be stated formelly as follows: 


Problem. Let Xpo tt Xy 


as NG L)s.t, = 1,2,°°°,N, & > Os" The problem den£o tese 


be independently normally distributed 


the hypothesis 


Hy) = Uo hint (1) 


VS. 


a eae ier gS Sa (2) 


where the point of change 'r' is not known. If the change has 
occurred, an estimate of the change point 'r' is desired. 


2. THE LIKELIHOOD METHOD 


Let Xp» Xoo oXy be independently distributed as 
a E)y 2k @ Lys * oN, een eee 
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# 1 - N 
DSOIES (x kes ene. Sats - XX.» =>\(N=r) > 
ge fare ot sat : 
r <= N 4 ns 
oR ROY a es ee ay 
aeeaN PAGE Ge 6). Be yt Weg 
and TASS 0) maxdtea-, (3) 


Bb Es 


l<r<N-1 


“nw nw 
where r_ is the point where the maximum occurs. Then r is the 
maximum likelihood estimate of r and Ts is the test statistic 


for testing the hypothésis H in (1) vs A in (2). 


2.1 Dtstrtbution of the Test Statistic Ta. The distribution of 
the statistic Ts or equivalently Ss is difficult to obtain. 


The percentage points given in Table 1 are obtained by Monte 
Carlo techniques. For the benefit of the readers we have repro- 
duced in the last portion of the table the percentage points for 
one-sided test given in Sen and Srivastava (1975c). 


In order to test the accuracy of these tables, we give below 
an upper and lower bound for the percentage points by Bonferroni 
inequality: 


P bo ee OO es 


Peale Port2 


k 
where “Pr=1--"') P(my)“+ J PCED A EL =" 


= eae <a | 
i=l i, i, Al D, 
+ (-1)" Pe PR ie Ve? ea.) 
P : ab i ‘L 
Lo eee ul P. r 
al r 
E> Boat Ey are events and EY is the complementary events 
of 


i 
Since the statistic t is invariant under nonsingular 


linear transformations, we shall assume without loss of generality 
that?) = 15" Let 
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TABLE 1: Percentage points of P{S~ > c}. Percentage points for 


the one-sided test with p = 1 taken from Sen and 
Srivastava (1975c). 


n 90% Sia 99% 90% 957 997 
prac = 
9 0.59850 0.67943 0.79930 0.75688 0.81341 0.89097 
10 0.54074 0.62353 0.74597 0.71104 0.77004 0.85087 
i. O; 47743 0 0;55000 .1.0607 L7h 0.62747 0.68406 0.78988 
15 0.40337 0.46775 0.59305 0.53145 0.58507 0.68743 
20 0.32205 0.37556 0.48757 0.42853 0.48202 0.57685 
25 0.26561 0.31068 0.40067 0.35524 0539723 0.49273 
30 0.22703 0.26905 0.35556 0.30784 0.34571 0.43358 
35 0. 20006 0.23469 0.30356 0.26572 0.30244 0.38223 
40 0.17682 0.20983 0.27986 0.23835 0.26818 0235599 
45 0.42983 0.45319 0.50279 0821333 0.24203 0.30162 
50 0.14605 On a22 0.22893 0.19443 0.22079 0.27842 
3) 0.13435 0.14865 0.21530 0.18075 0.20658 O05 25332 
60 ON122 7358 OR La 556 We A) siy/ae 0.16712 0.19042 0.24013 
= 3) = } 
9 0.86321 0.90100 0.95274 0.93122 0.95225 0.98030 
10 0.81646 0.86036 O72 0.89216 0.92050 0.95949 
IL 0.73506 0.78009 0.86278 0.81345 0.85251 0.90931 
a 0.62586 0.67883 0.76367 0.70757 0, 75129 2 0. 82921 
20 0.50618 0.55494 0.64622 OSSy/SES 0.62176 0.69995 
25 0.42409 0.46583 0.54144 0.48724 0.53224 0.49789 
30 0.36786 0.40867 0.48409 0.41876 0.45786 0.53528 
35) 0.32011" “0.35600 0.42975 0.37243 0.40890 0.48475 
40 0.28631 0.31873 0.38936 0.32978 0.36244 0.43006 
45 0.25956 0.28885 0.35059 0.29773 0.32898 9.38657 
50 0523737 0.26597 0.32541 0527333 0.30196 0.35932 
55 Oo 217 03a 0524360 0.30658 0.25160 0.27796 0.33059 
60 0.20169 0.22531 0.27546 0.23334 0.25907 0.31084 
p=5 p = 1, one-sided test 
9 0.97470 0.98470 0.99501 2.68766 3226172 4.6487 
10 0.94582 0.96234 0.98435 SBS SER) 3.08961 4.32861 
12 0.87772 0.90674 OF9457 1. 2 oO dug: 3.01776 4.08071 
LS 0.77460 0.81265 0.87580 2 32 2.94701 3.90726 
20 0.63872 0.68443 On7 SDL 2.51568 2.90492 3.76205 
5) 0.54015 0.58265 0.65908 2.48975 2.86356 3.59018 
30 0.46803 0.50539 Ona79oL 2.49352 2.85028 3.57691 
35 0.41320 0.44663 We Suki 7/s' 2.03395 2.85028 3.57691 
40 0.36919 0.40287 0.46901 2.52323 2.85604 3.48837 
45 0.33413 0. 36387 0.42353 2.50628 2.83838 3.48449 
50 0.30807 0.33579 0.39341 2.54559 2.86174 3. 10783 
EIS) 0228220) 30730740 U5 86533 2255002 2.85646 3.54822 
60 0.26006 0.28626 0.33945 2. D428 2.85276 3.46943 


a ee 


e 
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Joys N - : 
sah pi | ae where V = Lee (x, ~ x) (x, ee A 
and SA. = max s. = = wt aes > ee (4) 
1<r<N-1 a Or 


-1 
Then T. = s (1 - Ss.) and the maximum likelihood test is equi- 
valent to the one based on Sa. And the maximum likelihood esti- 


mate of r can be obtained either from (3) or (4). We shall 
now obtain a lower and an upper bound for P{Sa > Cis Let 


at = Nor) fe 4 e+, (Nee) Ao, ~(er) eer) 


r terms (N-r) terms 
Then. om = ak vex a. Note that a' e= 0, where 
a Ses = Ties 
e' = (1,°°°,1) : 1XN, and a' a = 1. Hence, from the results 
== er ar" 


Of Section 3.1, 


ipl sah, 
P 1-S. p,N-p-l 


where '~' denotes ‘distributed like' and Fo x denotes an 


F-distribution with (m,n) degrees of freedom (d.f.). 


We shall now obtain the joint distribution of So and Si 
Doses. Lec 
1 
a 
A. =e: 2xN and M=I-Ntee', 

! 
= 

where e' = (1,1,+++,1) : 1xN. For the sake of notational con- 


venience, we shall drop the subscript '‘t' from Apes 


Let C bea (N-3)XN matrix of rank (N-3) such that 
Ce=0 and CA' = 0; note that Ae=0. Then 


v = XMx" = xa'(AA') “ax'+ xc'(cc') cx! 
Hence, XA' and XC' are independently distributed. In the 


notation of Srivastava and Khatri (1979, pp. 170-171), 


5 N, 5 (0, I, AA’) and xc'(cc') tcx'~ wor, N-3) for 


> 
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p < N-3. We shall assume that p< N-3, p22. Let U= y Ne 


and W= xc'(cc') “cx. Then the joint pdf of U and W is given 
by 


elaa'| 2 he] ee etr - %[UCAA") sum ow] 
where c= ra? (N-3) r@x@-3)) Let 

ie W At CAA) Ue andi ote oR 
Then the joint pdf of R and V is given by 

cfaat | P}v)2-P-2) pr Canty terR| 2PM oer — ay, 
Hence, the pdf of R is given by 


| = 1 se 
c, |Aa"| P| y =, (AK) Tar y 2s p-4) : 


where cy 


the pdf of G (see Srivastava and Khatri 1979, Lemma 3.2.3, p. 
76) is given by 


= Pr Sr Sos ere). Let G = R'R. Then, 


2: 


“hy ay ga 2s GN 
c, |B| “6p Jo] 2°? 3) |Z -B 162s p-4) 


where B= AA' , and 


= 2P gP pQebyr Ey sp Shy Oy @r ezt 
oy any: Witen € 5} las 5} )/T ¢ 5 T¢ 3 PCT C 9 ) 
Letting Bo = ao Ska a we find that 
s g 
Age | ghaags0 
BO 


Hence, to get the joint pdf of oT) and Sie we need to integrate 
out Bo: Expanding the determinants in the above expression for 


the pdf of G, we get the joint pdf of So» S.. and 85 given by 


—'sp = genes) py 2 2 
c,|B| (SS - 85) [1 - B(S, + 8,) + (b’ - d°)S)S. 


2 L(N-p- 
- 2dg + (a = bel sali Be? 


-1 p38 2 i 
where B- = » SpSonrig, > Og and), | TS Re veluoD. 
di: ak 
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Expanding the last expression given in the bracket [ ], term by 


term integration can be carried out and the joint pdf of So and 


S can be obtained. For computational simplicity, we may choose 


N and p_ such that (N-p-4)/2 is an integer. Thus 


N-1 N-1 
stp hart Se E eilaha hack Poon ceahitieal  Sane ee 
= r 1 a 
i=l i=1 
N-1 
+ iS > > 
4 < bet ay “? 
1 
a N-1 
That is; Pie ok oe oe Sh cS ee 
4 e] pl Qa ore a7 Qa a) Q 
2 
N-1 
< Ne < > ° 
P(S~ > c,) oy eee) 


These upper and lower bounds of the percentage points may be 
used to check the accuracy of the simulation result given in 
Table 1. If these bounds or some linear combination of it (say 
average) is close to the simulation result, then they may be used 
to obtain tables for many more values of p. However, this has 
not been done here. This along with power comparisons of the two 
procedures are planned for a future communication. It may be 
noted that in all of the studies carried out by Sen and Srivastava 
(1975a,b,c, 1973), the likelihood ratio test seems to have a 
better power than other competing tests when the change point 
occurs near the end (a more practical situation). Also, the 
likelihood method gives an estimate of the change point. 


3. AN ALTERNATIVE TEST 


Let xy be independently distributed random 


x)? LP i 
p-vectors. Define 


N-1 ; 
u= o 1 (544 Ls x)/(a'a)? = (ide FPA, 
where a' = 4s [- (N-1), -(N-3), -(N-5),°° -(N-5), (N-3), (N-1) ] 
N N be a 
and x a pe Leta Vc= ) (x, - x) (x, - x)'. Then we 
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propose a test statistic 


Q=u'Vu 


for testing the hypothesis H against the alternative A. It 


may be noted that uD) is the univariate test statistic used by 
Chernoff and Zacks (1964) for testing one-sided change when variance 
is known. It should be mentioned that a nonparametric generali- 
zation of the Bhattacharya and Johnson (1968) test can also be 

given on this line. 


3.1 Distrtbutton of the Statistic Q. Let X denote the obser- 
vation matrix, X= (X)> Xy9°' Xp) : pxN. Then, it can easily 
be shown that 


u" = bg? and Vie XX" Se Nex! ¥, Sy tate 


N 
where x=N ‘ i Xp and the vector a has been defined above. 
i=1 


Thus 
Q= b'x'v exp and 1+Q = |V + Xbb'x'|/|v| 


where |B| denotes the determinant of the matrix B. Let TI 
be an orthogonal matrix of the order NXN whose first row is 


given by e' /N?, and the second row is given by Gia e 


where e = (1,1,°°+,1) 1XN , a row N-vector of ones. That is 
1 
e'/N? 
2 
Tae a'/(a'a)® 
C 


where C is an (N-2)xXN matrix such that [ is an orthogonal 
matrix. 


Let. Ye= (CY 9°" * Sy? SX ee hen 


N 
. yl. and Xbb'X' = YIbb'T'y = ! 


<a 
i] 


and under the hypothesis H, Yoo" "Yay are independently dis- 
tributed as Sal x). And 
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N = 
_ Peary + F5 xvi wie 2 ae 


Sens | nf, sh til es bya Serta 
Yolo * Lz Vydy Yo" Lo 


N 
where W = iS y¥4 » and W has the Wishart distribution with 
N-2 degrees of freedom, W ~ MM N-2) . Hence 


my 
"W 
Q By Yo YL» My 1 
Lees ea 
2S aaa oy, To%o 29 


Note that under H 


Nee Sp ee roa ee 
YW Yo 


F 
P p,N-p=1 


where Fa i denotes F-distribution with p and N-p-l d.f. 


> 


Thus, the hypothesis H is rejected if 


N=p514.005.:N=prk ryt 3 
p455140 p ~2"2 %2 7 Fo Nep-1,a°’ 
where F »N-p-1,0 is the upper a% point of the F distribu- 


tion with p and N-p-l d.f. Equivalently, H is rejected if 


ae =. 
2 pee mr 
eae a voere Cel = tt # Ure TF  poeiias 


4. ILLINOIS TRAFFIC DATA 


Consider the Illinois Traffic Data given in Table 2. .The 
table gives the annual data from 1962 to 1971 on hundreds of 
traffic deaths, thousands of traffic injuries, thousands of acci- 
dents, and the number of deaths per hundred million vehicle miles 
in the State of Illinois. 


It was felt that the increase in traffic fatalities and 
accidents each year would tend to be constant. A change in the 
mean increase each year would then be attributed to external 
influences such as new regulations and safety standards. For 
example variables of interest would be the increase in traffic 
deaths or injuries from 1962 to 1963. Our interest in the 
increase in traffic deaths and injuries means that we must look 
at the new set of variables formed by taking differences. 
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TABLE 2: Illinots traffie data from 1962 to 1971. 


1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 


Deaths 


2 18.90" "20128 22-07 22056 925.22 24,95 ~ | 25-00imgre>. 35 23.40) | eee 
(10°) 


acim 112.31 119.89 134.16 145.54 149.14 149.51 148.73 157.45 159.88 148.83 
(10°) 


Accidents 
(10°) 252.02 258.68 281.16 324.07 329.42 337.56 351.07 405.51 409.17 393.57 
Dearhs per 
0 vehicle 4.9 Sik Bey Sok 53 Siok 4.9 4.7 4.2 4.2 
miles 


Sen and Srivastava (1975c) considered the four types of 
traffic data separately and applied univariate results. Here we 
apply multivariate techniques. Let 

CD iri oP may Oe iOS 
gia Cees oo 


uftd allie 


1329°5" NN, 


j a5253,495 
where Oe is the jth type of traffic data for the ith year. 


In vector notation, the above model becomes 


Ko SoBe, ey) a Le ae Nie 
wo) el ee 
We assume that £,'s are independently normally distributed with 


mean vector zero and covariance matrix 2. When 2 is known or 


y= oct, fo} unknown, Sen and Srivastava (1973) proposed tests 

for testing the hypothesis (1) against the alternative (2). In 
this section we test this hypothesis using the statistic given 

in Section 3 when 2% is completely unknown. An estimate of the 
change point 'r' is obtained. by the likelihood method of Section 


2. Using the results of Section 2, we get the values of T. 


obtained. as. 2.64.5 3,925 _ O52) 40.44. de/i U0 ,botg 1. 95. oo oO 
change in mean was estimated to occur after the 4th interval. 
That is, there was a shift in the annual increase of traffic 
casualities after 1966. The value of the test statistic Q is 


0.8750. With C Ab = [1 - {1 + 4) (Aa" x 6.39} 7] = 0.865, we 
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reject H and claim that there was a significant shift in the 
rise of traffic casualities in Illinois with the shift estimated 
to have occurred in 1966. From Table 1, the likelihood ratio 
tests reject H if Ts = Gl TS) > .94. However 


re + (1 + Ts) = 6.44/7.44 = .85. Hence H is not rejected. 


The discrepancy in the two results is attributed to the fact that 
there may be several changes in the mean and even if there is one 
change, the change occurs near the midpoint where the Bayes 

procedures has been shown in the univariate cases to be superior. 
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A TWO-DIMENSIONAL T-DISTRIBUTION AND A NEW TEST 
WITH FLEXIBLE TYPE I ERROR CONTROL 


G. LANDENNA and D. MARASINI 


Instituto di Scienze Statistiche e 
Matematiche 'Marcello Boldrini" 
Universita degli Studi di Milano 


SUMMARY. The usual t-test for a null hypothesis H, = Hoi A Hoo 
* 
where Hoa? HW = Hs, j=1,2, concerning the means of two normal 


populations with equal but unknown variance, does not account for 
the possibility that one of the component hypotheses may be 
principal, i.e., scientifically more relevant or important than 
the other. In this we use some properties of a bivariate t- 
distribution to construct a test which permits asymmetric treat-— 
ment of the two component hypotheses. The proposed test is shown 
to be unbiased and consistent. 


KEY WORDS. hypothesis testing, consistent tests, unbiased tests, 
t-tests, t-distribution. 


1. INTRODUCTION AND SUMMARY 


Consider the problem of testing a null hypothesis H, = 


* 
AH U. =U., j=l,2, where Uys Uy are the means 


H gE ed : 
ol o2 oj j 2 

of two normal populations with equal but unknown variance oO . 

The traditional procedure for Hy is a t-test based on the premise 


that the two component hypotheses eo are equally important, and 


permits control of only the overall type I error with respect to 
Ho: However, it is not uncommon to have practical situations where 


one of the components is considerably more important than the 
other and it is imperative to adjust the test procedure for this. 
In this paper we discuss some properties of a bivariate t- 
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distribution and use these to construct a test which offers 
flexibility in the type I error control with respect to both the 
overall null hypothesis H, and its components Boy Specifi- 


cally, we assume that H,, is the prinetpal component, i.e., of 


primary interest, and propose a test for which 


i) the probability of the type I error’with respect to 
H, equals a specified value a, O0<a<1; 
ii) the probability of the type I error with respect to 


Hoi equals another specified value a', O<a' < 


On k. 


The type I error control in this proposed test is analogous 
to that in the well-known step-down procedures in multivariate 
analysis as reviewed in Mudholkar and Subbaiah (1980). See also 
a study of a modification of such a procedure by Mudholkar and 
Subbaiah (1981). 


In Section 2 we discuss some properties of a bivariate t- 
distribution. The new test with a flexible type I error control 
is presented in Section 3. Also in this final section, we show 
that the new test is unbiased and consistent, in the sense that 
the power function converges to unity as either the sample sizes 
or noncentrality parameters tend to infinity. 


2. A BIVARIATE t-—DISTRIBUTION 


Let m,) and m, denote the means of random samples of 


sizes ny and ny from two normal populations with a common 
; 2 

variance oO and means Wy and Uy respectively. Let SE 

denote the pooled estimate of oO based upon the two samples. 

Thatedis let 


2 
Ses 


of 


where g= ) ae Pe 
i=l 


JOInsopedsf. dot m,»M. and Me is 


2 2 
n, - 1 and 2 oe 2 (x,-m, ) /8,- Then the 
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2 m,—U ‘ m,—-U 2 
oR ld ert Ne! Beet Pig See 2 2 


ali 
Sees T og) ve deme le sivas 
De 2 sowed a 2 
2 (m,»m,) js (s ) e 
o(m,,m,,s )= 
TS) es 
1 r(Ble (0) 
(1) 
Observing that the ratios 

m.-U 

ts Fo i=1,2, (2) 
s/Vn, 


appearing in (1) are distributed as usual Student's t"s, and 


: : : 2 : 
integrating with respect to s we obtain the joint p.d.f. of 
t and at as 


1 2 
oe 
r - t,? we 
p(t, >t.) = 1+ —+ — (3) 
E se T g 8 8 


This is the p.d.f. of a particular case of the well-known multi- 
variate t-distribution, e.g., see Johnson and Kotz (1972). The 
marginal p.d.f.'s of this bivariate t-distribution with g 
degrees of freedom are the familiar Student's t-distributions 
given by 


Seti 
ig Ee he : 
v,(t,) = Z tos ser , - a=1,2. (4) 
os Vig 6 


On the other hand, the conditional distributions are not 
Student's t-distributions unless the conditioning r.v. takes 
values +1. For example, the conditional p.d.f. of ty given t 


Y 1 
is 
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(5) 


which, when t, = +1 reduces to the p.d.f. of the Student's r.v. 
with (g+tl) degrees of freedom. 


Lemma 1. Let (T, ,T,) be a two-dimensional Student's r.v. with 


p-d.f. (3). If e€ is any positive constant, then: 


4 1 
e(gtty) é(g1)* 
pads np(epley dt, = f no(t,|#1dt,. (6) 
Proof. We have: 
i 
e(gtt, 7)? 
ese fe ny (t,[t,)dt, 
srs aa 
2 
p (at? e(gtt,) fs 
(BY) greets, et 


(1-y) dy, (7) 
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where 


Mee 14e7) F (8) 


From the fact that both (7) and (8) are independent of the 
particular value assumed by the conditioning r.v.'s we have (6). 


Now recalling that the marginals of (T, »T,) are Student's 


r.v. with g degrees of freedom from Lemma 1 we derive the 
following: 


Lemma 2. Let (T,»T,) be a two-dimensional Student's r.v. with 
Dedst. (3). If go and pds are any real eonstants .(0-< ao < 


* 
Qu) and ty is the solution of 


* 


a 
tty Gages ake 5 (9) 


—co 


* 
then there exists a positive value ¢€ such that: 


Hi, Wit, .t,)de dt, = 1-2. (10) 


Here R is the region of the (t,»t,)-plane defined as follows: 


R: Ci) 


Proof. We can write: 
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i 
ae € (gtt,*)? 
(1-a) = Us W(t, ,t,)dt,dt, = a a W(t, »t,)dt,dt, 
i 
es e (gt, ”) 
= i Y(t) ip ms (t,)t, de, dt. 


But by Lemma 1 we have 


i 
* 
t 


€ (gt1) 
GQ-a) = f*,) f nj (t,|1dt, de,. 


—0O 


Hence from (9) we get 


1 
e "(gt)" 
(1-a) = (1-a') f ny (t,|Lat,. 


Thats. 
i 
é CB)? 
Hy (t;, | A) dt wens (12) 


oO 


* 
ise ty is the solution of (9) we have: 


ph 
* 
€ ny A 


ct 
MI 


Hence 


a) 
Hl 
tt 
N& 

Pe 
ao 
A 

1 

Nv] 
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The region (11) may now be expressed as: 


* 
apes ae y coty 


1 Zz 
aK * a Do 
~"o* £16 t, "Sst, (abl) “ert, x0, (13) 


* * 
where ty and ty are the critical constants corresponding to 


levels of significance a' and (1-a)/(1-a') which we obtain 
from the one-dimensional Student's tables. 


3. THE NEW TEST 


Consider two normal populations with equal but unknown 
F 2 
variance oO and unknown means Uy and Uys We want to test 


the null hypothesis 


j=1,2, assuming H, as principal. Let a be the level of 


1 
significance for Ho» and a' ‘that ‘for Hop Os oul se ae” 1 
Suppose we have random samples of sizes ny and ny from these 


populations and we have obtained 


* k 

m, , m= 

ty = 11 and t, = ee 
s/¥n, s/vn, 


where m, and m, are the sample means and at is the pooled 


estimate of common variance o”. Then (t, >t.) has the sampling 
distribution of a two-dimensional Student's r.v. (T, »T.) with 
g=n, +n, -2 degrees of freedom. It follows that for testing 


ik 2 
hypothesis Ho: Hoi A Hoo we consider the region (13) and we 
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accept H ge (t,>to) € R, otherwise we reject it. This 


procedure for R ensures the control of the type I error with 
respect to both Ho and Hoas 


Now consider the alternative hypotheses: 


il 
of this test and the convergence to unity of the power function 
with respect to the above mentioned alternative hypotheses for 
large values of the corresponding non-centrality parameters 
AyoAg: Towards this end we write: 


= * = * 
where: Hee: Uy < Uy > H,: Uy < p> We now show the unbiasedness 


H- A Bess 2. > OFGRF Sea 


m 
> 
ie) 
> 
ne 
Vv 
io) 
= 
ib) 
i} 
jo) 


In doing so, we note that a one-dimensional Student's non-central 
T can always be written in the form: 


2 
=. Meutd _ 1, i] (a eee d 


= See an 
s//m  |o/ya © io ee 


where M and S are the sample mean and the sample standard 
deviation respectively, Z is a r.v. having normal distribution 


in the standardized form, va has- g degrees of freedom, and 


A} = vné/o is the non-centrality parameter, 6 = (u-_). 
re) 


Therefore, considering the two-dimensional Student's non- 


central fi V. (T, ,T.) with non-centrality parameters: 
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eg ee AS: 
il fo) ? 2 oO ? 


* * 
where: 5) eee and 6, = Uy - Uy, we can write the 


probability of committing a type I error as follows: 


RK 


~ * ~ 
BCA, Ay) = Pr{T, (A,) Se ar T, 005) <t, } 
Z, Xr Z, A 
1 ji * 2 2 #k 
= Pr ————_—_ + << t.. 
pee 6: 2 /2 > 
Gere) = eebieale Gia > Oca S 
Bs 
Akg oe 2 
= PryZ Set, 3 < ye Z, ty 3 <- hy ; (14) 


where ty and t. are the critical values according to the 


region (13). For large values of Ay opel in (14) we obtain: 


2 


lim  B(A,,A,) = 0 (i=1,2) 


dA, 
A 


so that the power function, 8(A, >A») =l1- BOA, Ay) converges 
to unity. 


From (14) it also follows that BCA, +A») is always a 
decreasing function of A, and A From this fact and 


1 2a 
save = 0, (15) tetoivthe. form: 


observing that for A, = 9 


1 


Dy dep 2 2)1/2 
2 ie * ae < kx* Ser 
g(0,0) = Pr{Z, - t, [x <i0ghoz, ey 4 <0 
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~~ Ae on, Sy eae pee 


1 * 2 kk 


I 
ae) 
8 

~ 
i) 

A 

Gr 


* 
1? 


SS ~ kk 
Pr{T, (0) Soll T, (0) < ty } = Tw, 


it follows that the power function BCA; oA)» for every pair 
(A, »Aq)» with Ay S00) Ay = Omeand (A, +A,) #0, is greater 
than a and therefore the test is unbiased with respect to 

Hy A H,- 


In an analogous manner it can be shown that the above 


properties hold with respect to alternatives Ce A H, and 


HL A H,- The test can be easily modified to test H, = Aoi A Moa 


: * 
against two-sided alternatives of the form Hy: Wy # Uy> 
= k 
Ho? Uy # Uys 
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TESTING OUTLIERS IN MULTIVARIATE DATA 
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SUMMARY. Given n random observations on a p-dimensional random 
vector x, the problem is to test whether a specified number 
(usually small) of suspected observations are outliers (too 
discordant as compared to the bulk of observations). Asa 
generalization of Tiku's (1975, 1977) univariate statistic, we 
propose a statistic g for testing a specified number of outliers 
in multivariate data; g is the ratio of the product of robust 
estimators (Tiku, 1980) to the product of ordinary estimators 

of the scale parameters. For the multivariate normal, g is 
shown to be considerably more powerful than the prominent statis- 
tic R (restricted to the multivariate normal) due to Wilks (1963) 
under location shifts (model A; Barnett and Lewis, 1978) although 
slightly less powerful under scale changes (model B; Barnett 

and Lewis). Like R, g is not sensitive to changes in corre- 
lations (orientation). The statistic g can be used (under 
models A or B) for testing outliers in samples from any multi- 
variate distribution whose marginal distributions are of the type 


(1/0) £((x-u) /o). 


KEY WORDS. Multivariate outliers, censored samples, robust 
estimators, modified maximum likelihood estimators. 


1. FORMULATION THROUGH MARGINAL SAMPLES 


Barnett and Lewis (1978, pp. 220, 221) remark: We should 
not underestimate the role to be played by the marginal samples 
(that is, the univariate samples of each component value in the 
multivariate data) in the identification of outliers. Firstly, 
we know what we mean by a marginal sample. Secondly, we have 
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facilities for testing the discordancy of such univariate outliers 
for a range of different basic models (and we can adopt models 

to explain the outliers). And thirdly, perhaps most important, 

it is quite plausible to expect outliers to be exhibited within 
specific components of the multivariate observations. 


In the spirit of this remark, let (xy oXpgo 9X4) > 
i = 1,2,°¢*,n, be a random sample from a multivariate distribu- 
tion E(x) »%92°° 9K) 5 E(x,) = Ups V(x,) = of and Cov(x, ,x») = 
Pree Oy To» Pre = 1. Inthe available n observations 
Wipes oe ee (1) 
on the kth component Xo however, Tr 0) smallest and 


TG 0) largest observations appear to be too small and too 


large (potential outliers), respectively, as compared with the 
bulk of observations in (1); k =1,2,*°**,p. The problem is to 
test whether these suspected observations are in fact outliers. 


For the univariate case (p = 1), a number of solutions to 
this problem are known; see Tietjen and Moore (1972) and Tiku 
(1975, 1977) and the references cited in these papers. For the 
multivariate case (p 3 2), this problem is more complex and, 
therefore, the solutions are relatively sparse; see, however, 
Wilks (1963), Cox (1968), Healy (1968), Gnandesikan and Kettern- 
ring (1972), Barnett and Lewis (1978, Chapter 6) and Barnett 
(1979). 


2. THE TEST STATISTIC 


Let X bh 2 y°8e yx (2) 
ryt Tt n-T) 5 

be the Type II censored sample, obtained by arranging the obser- 

vations in the marginal sample (1) in ascending order of magni- 


tude and censoring the Ta smallest and the Th largest 


observations; k = 1,2,***,p. In the bivariate case it will be 
convenient to write r = (Ty poTyo3To1 °F): Let O. be the 


robust estimator (modified maximum likelihood estimator; Tiku, 


1967, 1980) based on the censored sample (2) and Sy be the 


ordinary estimator based on the complete sample (1). In fact, 


Sy is the estimator O with Th = he = 0; see the Appendix. 


Pretend that the correlations Pre are known. A robust estimator 


of the "internal scatter" of the multivariate sample is then given 
by lCo,¢ O, S))| and the corresponding ordinary estimator is 
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iven b Ss : i i 
g y 1,9 k sy) | Consider the ratio 


P 
g= ence 6, 60) | / Cop» S. sy) |} = a (6./s.), (3) 


‘ 2 eee 2 
6 — 4 eee ct 
hase NP k 30) | (oF 65 6) 1 CO, 9) | and [(o,¢ S. sy) | 
(sj) Sy ied 
ing suspected outliers (see also Tiku, 1975, 1977); small values 
of g indicate the presence of outliers among the n data 
points Xs The statistic g is a natural generalization of 


)|(e, 9) |. We propose g as atest statistic for test- 


Tiku's (1975, 1977) univariate statistic @/s. The explicit 


expressions of me and Si). for the normal, exponential and 


uniform populations are given in the Appendix. Note that g is 
location and scale invariant. However, g takes no explicit 
account of correlations (orientation) of the data. A statistic 
g* which is similar to g but takes explicit account of the 
correlations is developed in Section 4. 


The null (no outliers exist) distribution of h(G,/s,), h 
being a constant, is exactly Beta for the exponential End “uniform 
populations and approximately Beta for the normal population; see 
the Appendix. Note that the p _ ratios 6, / Sy keey 1.283 * 5p, 


are uncorrelated (independent for the multivariate normal and 
certain types of exponential and uniform populations; see the 
Appendix) if all the correlations P.g are zero, but not so if 


some are non-zero. Evidently then the null distribution of 


Q 
k& 
g depends on the correlations Pig: Luckily, however, this 


dependence is not 'strong', that is, the mean and variance and 
the percentage points of g do not change much with changes in 
Pre? at any rate for the bivariate normal. For the bivariate 


normal, the simulated (based on 10,000 Monte Carlo runs) values 
of the mean and variance and the lower 5 and 10 percent points 

of g are given in Table 1. It is clear that these values do 
not change much with changes in (; see also Table 3. The same 
of course is true for the bivariate exponential and uniform popu- 
lations but we omit details for conciseness; see, however, 

Table 3. The percentage points of g for p=0 can, therefore, 
be used also for non-zero values of p. The equations for cal- 
culating the approximate percentage points of g are given 

in the Appendix. 


Two most important models for generating a single outlier 
are the following (Barnett and Lewis, 1978, p. 210; Barnett, 
1978): 
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TABLE 1: Simulated values of the mean and variance and the Lower 
5 and 10 percent points of the statistie g for the bivariate 
normal. 


n=10 n=20 
aN AON Moen: FS DE LSE SO 
-.8 .-.5 0) La :) wi: edt &, 0 ES) 8 


a (0, dh 3 0, 1) 


Mean#+ 5984: &985« ‘984 © 983; 9°985" "O97 = < 99. 99Gb S9a wee 
Var. .018 .019 .020 .021 .024 .0041 .0044 .0044 .0047 .0056 
5% 2 7288 FAFA 712 FOP! (678 IBIS BOF "eee G2” eas 
10% TIS ned B80r «787.» «78755 ied 7a oP09,.5808 47905 20D) moe 


a CO = 85RD) 


Mean, « 1.988. 06986 20984 982... ..98le «998.9997; 99], stat? Iam boelee 
Var. .023 .019 .019 .019 .019 .0053 .0045 .0044 .0043 .0042 
5% 688.5216. .c 716.4870 Segl? 4.852) - SO SERS Srs. Bite aero 
10% 0779)%.788 2788 > ..790) 794.45 903" 6908 e065 209.8 ee) 


5 (1, We Ihe 1) 
Mean’ -*°3972"* .967° "3965" 964 °9:968°9° 995° 997993" .992 9904) = 3991 
Vary 151052948046 +2045" 80468 .052°% (002 8009510095" -0097-*-o 1m 


5% ~948--4582° 1597). 580°? 2548 4 7947. 2813 PRSLD “Slee ss 
10% 048 672. ...676 ..070 SESLA 84s... 802.5%. GOUL. oot nan Oree 


Model A: E(x,) 


i 
a 
=f 
2p 
“- 
103) 
° 
B 
@ 
= 
Y 


(j # iD 


i] 
= 


E(x.) 
oJ 
with variance-covariance matrix V(x;) = V 


(j = POU at he 


Model B: V(x,) bV (some i) (b> 1) 


Voy ev GED 
with mean vector EC) =u (j = 1,2,°*¢,n). 


The mean and variance of all the components x, can without any 


loss of generality be assumed to be O and 1, respectively. 
Generalization of the above models to more than one outlier is 
straightforward. 


Model A tends to generate a single 'inflated' (too small or 
too large) order statistic only on one side (depending on the sign 


of the kth component ay of a) of the marginal sample (1). 
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Model B tends to generate a single ‘inflated' order statistic 

on either side of the marginal sample (1); E(x,) = = 0. This 
governs the choice of the values of Th and Th (9 a eS 
the number of smallest and largest observations, respectively, 
censored in the marginal sample (1), for calculating g. Under 
model A, for example, if a is greater than O (in fact, sub- 


stantiall t Zi = i: = 
ntially greater than 0) then Ta O and Th 1, 


k = 1,2,°**,p; T 4 Z z 
sP (see Table 2) Under model B, Th = TK at 


for all k. Another model which typically generates a single 
"inflated' order statistic is Tiku's (1975, 1977) outlier model 
(labeled slippage model of Barnett, 1978, p. 249) which adds a 
constant to the largest order statistic or subtracts a constant 
from the smallest order statistic of a random sample; under this 
model the statistic g, like the univariate statistic @G/s, is 
particularly powerful; see also Tiku (1975, 1977) and Hawkins 
(1977). In the rest of this paper, therefore, we study the 
performance (power properties) of g only under models A and 
B; power defined to be the percentage of the time a single (in 
general, a specified number r 21) outlier is detected which 
is perhaps the most useful of the five measures (David and 
Paulson, 1965) of the performance of an outlier test; see Hawkins 
CLO77, ps 436): 


3. POWER COMPARISONS 


For testing a single outlier in the multivariate normal 
sample, the prominent statistic due to Wilks (1963) is given by 


R = min {|Bx, si st) | / |.» S. s») |} (4) 


where the numerator is the maximum likelihood estimator of the 
‘internal scatter' obtained by deleting one (out of n) observa- 
tion (x) 4> ee >> P= leer ne ata times and fhe 
denominator is the maximum likelihood estimator based on all the 
n observations; R is thus the minimum of n_ ratios (of the 
determinants of two p X p matrices). Note that R is location 
and scale invariant and the null distribution of R is invariant 
with respect to the correlations Preg (Wilks, 1968). The exact 


null distribution of R is not tractable (Wilks, 1963; Barnett, 
1979); see, however, Siotani (1959) who tabulates the approximate 
percentage points of a similar statistic. Note that for testing 
a specified number r 22 of outliers, the statistic R is the 
minimum of a large number (n!/r!(n-r)!) of ratios like (4) and 
the computation of R is therefore problematic. The computation 
of g does not pose any problem, for testing any specified 
number of outliers. 
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TABLE 2: Simulated values of the power of g, for testing a single 
outlier in bivariate normal samples; 0, = 9, = 13a = (a) 54,)- 


Model A: (49% 4) replaced by (x) 44,95 Xy,+84%) 
r = (0, 1; 0, 1) 
a = (3,3) a= Cosh) Petar & eh 
op =0 p=.5 p=0 p= .5 p=0 p= .5 


5% GSB aer5S » atGn .50> NZOP 28 Teles 2285 06 +06 .06 .07 
10%). 535).00 2350 63 SSR E60 S29 G0y Sa pf eee lp oi 


Cy Re ele Toes CESS cha See rock CUE Sieh Seer =O7 ..06 -07 
10% 68) ~69" 248) “.64—" J40" 446) 286% 245" 2 als) S22 a) 


13S (0, P51; 0) 


is) 
| 
= 
Ww 
. 
| 
w 
~ 


a = (3,-1) a = (1,-1) 


Si) SOM GOO) ate dames Smee Oe tat Oe One Oe 06 «05 -07 
LOR a cS") 2 OF. Se oO edie SO ae SO. Nirene ee 1h <i2 10 3 


n = 20 


5%. asad 258) 232 sod) 620M .s0 le Se Ob -06 .06 - 06 
LOZ” 67 568 “She Gai SS Ae Se G2 Le pe i ee si2 


Model B: C14 Xo) replaced by (bx, ;> bx ;) 
ental teal eg) 
a Fick ade mina seo = 
b=2 b=8 b=2 b=8 b=2 b=8 b=2 b=8 


n=L0 n=20 


D%. el? i280) .80! Slee ole eng? wile lO SO .60l wt Teele me aes 
LO% .20°520'".85 184 20° .20°3851.82) .25? (22° 289-.88 25 22 F89h+86 


—_—_ OO ON 


For 4470) ryo7t and 54789979) the values of the power (for 


5 and 10 percent significance levels, respectively) of g 
are as follows: 


* 0.39 and 0.53 for n=10, and 0.40 and 0.52 for n=20; for all 
values of p. 

*k 0.39 and 0.53 for n=10, and 0.40 and 0.52 for n=20; for all 
values of p. 

*** 0.07 and 0.14 for n=10, and 0.06 and 0.12 for n=20; for all 
values of 0. 


—— 


a 
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For bivariate normal population, the simulated (based on 
2000 runs) values of the power of the statistics R and g for 
testing a single outlier, generated under models A and B, are 
given in Table 2, forh p =—e5>,0 andye5s Theivalgessfor .ps= -.8 
and .8 were similar (rather more favorable to g) and are not 
reproduced for brevity. It is clear that R and g are ineffec- 
tive if both the components of the suspected observation are in- 
liers (rather than outliers) in the respective marginal samples, 


Leen, ay and ay are small, irrespective of the values of 0. 


On the other hand, R and g are effective if either of the 
two components of the suspected observation are '‘inflated' 
extreme order statistics (potential outliers) in the respective 


marginal samples, i.e., la, | or la,| are’ large; see also 


Examples 1 to 3. Under model A, g is more powerful than R. 
Under model B, however, g is slightly less powerful than R and 
this is to be expected since R is ‘optimal' under model B; see 
Barnett and Lewis (1978, p. 219). We would expect similar results 
for p-variate normal population (p > 2) and for the detection 

of more than one outlier but this needs further investigation. 


Another important model is that of orientation (correlation 
changes). For example, a single outlier in the bivariate situation 
could be generated as follows: 


ModeLeG:—— corr’..coekt. — (x... ) =p (some i) 


corr. coeff. (2) 5 9% 5) == 04 #o (j #i) 
E(x, |.) = E(x) and V(xy 5) = V(x9 5) ea 12s 


Simulations reveal that both R and g (irrespective of the 


choices of Td and 9) are ineffective for detecting such 


outliers; for example for 0 = 0.8 and Po = 0.0, the value of 


fhe power of” Rand -¢ [r = (0, 1; 0, 1)] ‘are as follows: 


n = 10 ne— 20 

R g R g 

5% JOG se n05 1OD.2 SOS 
OZ Allyl eal 0) rial le At (0) 


The procedures proposed by Gnanadesikan and Kettenring (1972) 
may be useful here; see also Healy (1968). Note, however, that 
no statistic can be expected to detect all types of outliers in 
multivariate data (Barnett and Lewis, 1978, p. 220). 


210 M. L. TIKU AND M. SINGH 


Example 1. Consider the following 10 observations supposed to 
come from a bivariate normal distribution: 


x): .4930 .0280 1.618 -.7700 =.059 
Xo? = be pe COD whe OZone Loe O12 
x3 cae tA «3184. oma LeG0en 22210. Crp acu 
Xy? Stora 049 .0265 2445 -0765 


The third pair (1.618, 1.702) is suspect; since both of its 
components are the largest order statistics in the respective 


marginal samples (potential outliers), we choose TH = To, = 0 


and r = 1 for calculating the statistic g. The value 


=r 
12 De? 
of g and the minimum value of R (which corresponds to the 
pair (1.618, 1.702)), and the 5% points are as follows: 


g 
Calculated value Os. 0.54 
5% point O22 0.71, (for, -p.=0, .Tabie 1). 


The above data was in fact obtained by generating 10 random 
observations from a bivariate normal distribution (p = 0.85) and 
replacing a pair (X55 X54)> the third one, by (x, ,+2, X, 542) 5 
the 5% point of ‘¢ for ~p =.85)\ is infact 0268.) sTherstatiserc 
g detects the outlier but not R, at 5% significance level. 


Example 2. Consider the following 10 observations supposed to 
come from a bivariate normal: 


x): =, 0387" —T2119%. 6041 a1, 59ne— 597 
x! 4.424 1.67 - 669 .062 1.368 
x: ee. 1.449 -.841 1.714. S32 
Xo! 29) e395 786m) =. 578-1 O03 


The observation (-.038, 4.424) is suspect; since the component 
-.308 is an inlier in the marginal x, —~sample but the component 


4.424 is a potential outlier (the largest order statistic) in 
the marginal x,—Sample, we choose r = (0, 0; 0, 1) for 


calculating g. The value of g and the minimum value of R 
(which corresponds to the pair (-.038, 4.424)), and the 5% 
points are as follows: 
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R g 
Calculated value O24 0.68 
5% point Ona2 Ole. Sees 


the percentage points of g for this situation are given by 
Tiku (1975, Table 1). The above observations were in fact 
obtained by generating 10 random observations from a bivariate 


normal distribution (e@ = 0) and replacing a pair (X54 9%55)> 


the first one, by “Oi e X5 +4); g detects the outlier 


Cy 
but not R, at 5% significance level. 


Example 3. Consider the following 10 observations supposed to 
come from a bivariate normal: 


x13 2 Teor ls23u. 423909 +308» -.062 


Xo? -.082 ree AAT? dh 7a9pes. 120 


me 6 L.O59> t= 177 ©=.015 =. 018—-=.723 


1 
eae Soy 8) ie tee ee Ye amie Ari 40 4 eal 


2 
It is difficult to tell. which particular pair is suspect, if 
at all; perhaps (-1.659,.523) or (-.018, 2.2024). To calculate 
o “webtake 2° = (1, 0;0,0) for the’ first pair*and r«=' (0,°03 03° 1) 
for the second pair. The values of g and the minimum value of 
R (which corresponds to the pair (-1.659, .523)) are as follows: 


& 


Calculated value 0.45 0.96 (for the pair (-1.659, .523)) 
0.63 1.05 (for the pair (-.018, 2.2024)) 


5% point On22 0.79; 


0.63 is the second smallest value of R. The percentage points 
of g are identical for the two pairs because of the symmetry 
of bivariate normal and the fact that the marginal distributions 
are also symmetric and normal. The above observations were 
obtained as in the earlier examples, with the seventh pair re- 
placed by (x, ,+1, x5, +1)3 both R and g are ineffective 


here. Using g one could, of course, very easily test the 
significance of the two pairs (-1.659, .523) and (-.018, 2.2024) 
simultaneously, i.e., calculating g with r = (1,0; 0,1). The 
resulting value of g, however, turns out to be not significant, 
as expected; both R and g miss the inlier (-.177, .339). 


There might be situations (we had hard time to generate one) 
when R will succeed in detecting an outlier but not g. However, 
it is clear from the above examples and the values given in Table 
2 that g will succeed more often than R, under model A. 
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4. AN ALTERNATIVE STATISTIC 


The statistic g was developed under the pretension that 
the correlation coefficients Pree are known. For the multi- 


variate normal population, it is possible to define a statistic, 
say g*, without this pretension; g* is the square root of the 
ratio of the determinants of 'robust' variance-covariance matrix 
(6.9) and 'ordinary' variance-covariance matrix (s,9)+ The 
2 , : 

estimators Ok are the same as om (eq. 3) and the estimators 

are given b 
Ovo g y 


somg2hgayQeadh? 
Og = (or + or - 64) /2 


a 2 
where op is the robust estimator of the variance 65 ‘ oO + 
2) 


Oo + 2019 O, J») calculated from the n values 


Xia loengi. node Re be 


Xa ot X05 if Sig €0, "fae 
n = : F 
Sig = bar Oa, = XX eae Remember to change the sign of ray slg 
Sho < 0. Note that the sample ZyoZos° hoz will hopefully have 


fewer outliers than either of the two samples QE 
SETAE? PVE 
and x 3c CACHE eb aa 5 a is calculated exactly the same ey 
COs ie 2 ine D 


as om (or ore) with the ordered observations XG (or Xy.) 


replaced by the ordered observations Zo» and and r 


kl k2 
(or Te and Loo)» the number of smallest and largest 


observations censored, replaced by r* = max(r and 


1 kl? “ev 


rx = max(r) 4» Loo)> respectively. The efficiency-properties 


7) 
of the variance-covariance matrix (3-0) are discussed by Tiku 


and Singh (1980); suffice it to say here that in the presence of 

outliers Og have considerably smaller mean square errors than 

Sig: Note that for normal samples Og is unbiased for large n, 
; 2 2 ; 

since Ou. or and 65 are all unbiased for large n3; see 


Tiku (1978, Lemma 2). The statistic g* is thus given by 


a* = V1, 01 / (5,9) 135 


OL” lle LP 
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it is possible that a value of | 6,,.) | less than zero might 


occur but such a value is replaced by zero. Small values of g%* 
indicate the presence of outliers. At the present time, we have 
no clue how to obtain the null (exact or approximate) distribution 
of g*. However, we carried out a simulation study of the power- 
properties of g* and the results are given below for the bi- 
variate normal population, n = 20: 


Simulated values of the power (single outlier) 
Model:A:r = (0, 1: °0,.-1) 
(a,4,) 


o = 0 (Vea) 


rereent Point (3.3) (3,1).(1.4)...Percent, Point, (3,3) (351)..45)) 


5h 87 -54 Be OF, 5% 87 -40 -28 06 
10% aod -67 - 46 ails! 10% 391 -54 -41 213 


Model, Bir Hal 5 lk, 


5% » Oi: -13 84 5% -82 see 
10% 86 pa 87 .» 10% 86 2 


The statistic g* is also insensitive to correlation changes; 

the values of the power being .05 and .10 for 5 and 10 percent 
significance levels, respectively. As compared with the statistic 
R, g* is slightly more powerful under model A but slightly less 
powerful under model B. However, g* is clearly less powerful 
than g and is also more difficult to compute. 


Acknowledgement: Thanks are due to the NSERC of Canada and 
McMaster University Research Board for research grants. 
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APPENDIX 


Let xX Koa XS, (Qn r,t, b= n-r,) be the Type II 


censored sample, obtained by arranging a random sample 
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Kp oXoo yx Of size’ jn? “in ascending order of magnitude and 
censoring the Tv) smallest and the To largest observations. 

The expressions for the robust (modified maximum likelihood; 

Tiku, 1967, 1980) estimators of the scale parameters of normal, 
exponential and uniform populations are as follows: WNormal. 

The MML (modified maximum likelihood) estimator of o is given by 


6 = {B+/(B°+4AC) }/2V{A(A-1) }, (5) 
where A= N-T)-Ty> = 9% yX- 40) Xx. -(r50 978104 )Ks 
and Cc = ; xi+r 9BoX-- r 18, Xo-m x; 


i=a 


=n — = b ee . = = = 
men-rj-rjtr58)-r4B) >K = ()5_,X +r BoX,-ry8,X,)/ ms if ay=r,/n=a 


then a, = 0 and B, = -8, and if qd, = r,/n = q _ then a, = ol 
and B, = 8. The values of Q& and 8 are given by Tiku (1967, 
table 1), For n= 10; “however, the values of *q@ “and* "8" “are 
obtained from the following equations (Tiku, 1970): 

B = -f£(t){t-£(t)/q}/q and «= {£(t)/q} - Bt, 


where t is determined by the equation 9%(t) = 1-q; @% is 
the standard normal cdf. 
-.2 
The ordinary estimator of o is given by s = COW a / 
(n-1)}, that is, the estimator @ with rj =r5=0. The distribu- 
tion of {(n- Ty r,-1)/(n- 1)}(6/s) is approximately Beta 


B(n-r -1, ea see’ Tiku (197542 pY #47). 


om ey a 
Exponential. The MML estimator of o is given by 


b 
= Cake XK +r)X,-(n-r,)X,}/(n-r,-r,-1)5 (6) 


the ordinary estimator is given by s = ( y X, —nX p/@). The 
i=l 


distribution of { (m-r,-1,-1)/(n-1)}(6/s) is exactly Beta 


B(n-r)-r -l, r try) 5 yer) ateienl (CMSs 5 Gory Phe i 


2 1 
Uniform. The MML estimator of o is given by 


= (x - X)/@-r,-r,-))s (7) 
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the ordinary estimator is given by s = (X -X,))/@-). The 
distribution of {(n-r,-r,-1)/(n-1) }(6/s) is exactly Beta 
B(n-r,-r,-1, rjtr,)5 see Tiku (1975, p. 749). 

ie Pree = 0 for all k and & (k # 2), then the p 
ratios 61/81 k = 1,2,*°*,p, in the expression (3) are mutually 


independent for the multivariate normal and certain types of 
multivariate exponential and uniform populations, for example 
Gumbel's bivariate exponential family (Johnson and Kotz, 1972, 
pracol). 


eT eye Save (1 ei CEBH) LET. ete ye OF (8) 


For the population (8) with 8 =0 (i.e. correlation p = 0), 
the distribution of hg = {(n-r,-1) (n-r-1)/(n-1)7}(6,/s,) (6,/s,) 


is exactly the same as that of the product of two independent 


Beta variates B(n-r,-l, r) and B(n-r,-1, T5)3 “Sea Fe ap 


and Let then u be the product of m independent 


Peter tas on 
Beta variates B(n-r,-1, ri), i. =/,15,23's 2.2 soe TS ppg ee Lhe 
moments of u are easy to work out and the first four moments 
can be used to fit Pearson curves from Johnson et al. (1963) 
tables; these curves give accurate approximations to the percen- 
tage points of numerous distributions (Johnson et al.). Expre- 
ssions for the probability density function f(u) of u and 
its probability distribution function F(u) can also be worked 
out. For example, we have the following expressions: 


m 2. and. (Go=r = 


thee 
Ree u) eee Or<as< 1% 


f(u) 


F(u) = sgt - (n-2)log u}, 


1 
(n22) "(ne 3) 2 f2 Cae - (1+u) log u} ae Oss eee 


m = 2) cand or Er,=2 


rh 
= 
G 
Ww 
il 


F(u) =,ul7>[a(iue3) (22) Ueda) ule Clee wu) 
#(ne2)(1een) hu(nea) -(en-3)u 


m= 3 and ryt,el 
£(u) = [(n-2)7*(1eR- /Oe Tetehh ees 


MULTIVARIATE OUTLIER DETECTION 217 
-2 
EG@)b = u" [1 + {1 - (n-2) (log i) FAW 28 
m= 4 and rysrel 


£(u) 


ee ee oe 
F(u) = m=“ =2) et08 Wt) =3.(ne2) Coe )“46(=2) Log u -6}/6. 


The above equations were used to obtain the exact percentage 
points d given in Table 3;Table 3 also gives the simulated (based 
on 100,000/n Monte Carlo runs) values of the probability P(g < d) 
for the bivariate exponential and normal populations. It is clear 
that the above equations provide reasonable approximations. 
Incidentally, the Pearson curves (Johnson et al., 1963) based on 
the first four moments of u_ produce (even with linear interpola- 
tion in Johnson et al. tables) exactly the same values of d as 
those given in Table 3. 
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TABLE 3: Stimulated values of the probability P(g < d), d being 
the exact* 5 and 10 percent points. 


a ew a ee 


Bivariate exponential** Bivariate normal 
p p 
n d 0.0 O25 0.8 -.8 -.5 0 = as} 


eS (O, ‘leg 0, 1) 


10 699 Wy .057 Ae) .038 041 044 -048 .058 
aa hi dss .104 -105 eek, .084 .090 094 -095 .108 


20 856 .050 5055 .069 -035 041 042 . 046 -058 
898 .104 : 101 e113 .083 084 .090 094 2102 


40 3929 OS .056 .068 .034 ~U99 037 eye iy 042 
- 95k -104 ~LO2 - 116 -084 .086 - 081 - 080 .084 


ras (O, 13 fa 0) 


10 2099 -050 047 -047 eee 045 .043 -046 .043 
1/8 -099 O97. «092 .099 - 086 092 090 .084 


20 - 856 -048 SOE .046 052 042 042 W039 .036 
.898 .098 LO .096 093 .087 .088 .083 - 080 


40 ie 7A] -050 -046 044 .048 045 042 .036 -034 
ot .096 .098 -104 094 . 088 O91 079 .080 


Gee aie. ete) 


10 - 584 950 054 -067 063 - OS) 044 052 -065 
BSW ie) . 100 -101 etude splat ~102 nO ~L05 -116 


20 -802 O50 056 .067 -055 -044 .039 044 .058 
OD -104 .103 . 108 - 104 .088 . 088 094 «LO7 


40 . 903 One 7 0D5 064 043 034 «037 -036 044 

- 930 -102 -102 2 096 .086 .086 085 J092 

a a a ee 

* Exact percentage points of the distribution of u/h, u_ being 

the product of two independent Beta variates B(n-r,-1,r,) and 

2 

B(n-r,-1,r,), andy (n-r,-1) (n-r,-1) /(n-1) ; 
and ToT o1tT 9° 

This is Moran's bivariate exponential family (Johnson and Kotz, 


1972, p. 267) which is easy to generate; for this family 
correlation 0 3 0. 


Shas 6 Raa 


kk 
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A NORMAL APPROXIMATION FOR THE MULTIVARIATE 
LIKELIHOOD RATIO STATISTICS 


GOVIND S. MUDHOLKAR* 


Department of Statistics 
University of Rochester 
Rochester, New York 14627 USA 


MADHUSUDAN C. TRIVEDI 


Pennwalt Corporation 
Pharamaceutical Division 

POs Box 70 

Rochester, New York 14603 USA 


SUMMARY. For many multivariate hypotheses, under the normality 
assumptions, the likelihood ratio tests are optimal in the sense 
of having maximal exact slopes. The exact distributions needed 
for implementing these tests are complex and their tabulation 

is limited in scope and accessibility. In this paper, a method 
of constructing normal approximations to these distributions is 
described, and illustrated using the problems of testing 
sphericity and independence between two sets of variates. The 
normal approximations are compared with well-known competing 
approximations and are seen to fare well. 
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1. INTRODUCTION 


For most testing of hypothesis problems in multivariate 
analysis, under the normality assumption, several reasonable 
solutions of comparable merit exist. These include the tests 
resulting from the union-intersection principle, the class of 
likelihood ratio criteria and ad hoe statistics such as Bartlett- 
Pillai trace for MANOVA. The Neyman-Pearson theory provides some 
information on the operating characteristics of these procedures, 
but does not indicate any of the contenders as superior. How- 
ever, as demonstrated by Hsieh (1979), the likelihood ratio tests 
for many of the multivariate hypotheses have maximal exact 
slopes, i.e., they are asymptotically optimal according to 
Bahadur's (1967) method of comparing tests. From a practical 
standpoint the null distributions of the likelihood ratio statis- 
tics or of their competitors, are of crucial importance. These 
distributions, where available, are complex, their tables are 
generally limited in scope and not often accessible. Moreover, 
the tabulations concern only selected percentiles and are inade- 
quate for computing the p-values needed in practice. The prag- 
matic approach to such distribution problems from early days 
(e.g., Neyman and Pearson, 1931) is to seek reasonably accurate 
and convenient approximations to the distributions. 


The principal methods of approximating a likelihood ratio 
use the fact that, in large samples, its distribution is approxi- 
mately of Pearson type I form and that of its negative logarithm 
is of type III, i.e., chi-square, form. Nayer (1936) following 
a suggestion by Neyman and Pearson (1931) used the moments to 
approximate the percentiles for testing the homogeneity of 
variances in this manner. Bishop (1939), on the other hand, 
obtained empirical expressions for the parameters for a type I 
approximation by passing the intermediate stage of computing the 
moments. Bartlett (1937) pursuing the asumptotic chi-square 
character of a negative multiple of log-likelihood ratio, pointed 
out by Neyman and Pearson (1931), used moments to approximate it 
by a scaled chi-square variable for samples of moderate size. 
This approximation deteriorates as the size of the problem, as 
measured by the dimension of the multivariate normal distribution 
or by the number of populations in the problem increases, or when 
the effective sample size is small. A comprehensive investigation 
of various approximations was conducted by Box (1949), in which 
he introduced new widely known and used asymptotic chi-square 
series approximations for the distributions of likelihood ratios. 
Box studied his series approximations, in the context of two 
multivariate problems, comparing them with the exact distributions 


and with several other approximations including one based on the 
F-distribution. 
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The purpose of this essay is to describe a method for con- 
structing a Gaussian approximation to the null distribution of 
the likelihood ratio, and to demonstrate its efficacy and rele- 
vance in testing multivariate hypotheses. The normal approxima-— 
tion is outlined in Section 2. It is illustrated using two 
common multivariate problems, namely testing independence of two 
sets of variates and testing the sphericity hypotheses. Section 
3 contains the likelihood ratio statistics for the two problems 
together with current approximations for their null distributions. 
These approximations are then numerically compared with the new 
normal approximation in Section 4. 


2. A NORMAL APPROXIMATION FOR THE 
LIKELIHOOD RATIO A 


Let ¥ SEER SS be a sequence of asymptotically nor- 


uy Yo> 
mally distributed nonnegative random variables. The convergence 
of the distribution of Yn to normality can be accelerated by 


approximately symmetrizing it with a transformation as follows: 


Let a K-@), r = 1,2,°°*, denote the cumulants of 


Y= a and suppose that ig She and K /Ky = be riz 225.are 


bounded as n+, Then using the Taylor series it is easy to 
obtain the following asymptotic expansion for the expectation 


E(/k,)" of a power of Y as 


h(h-1) = 
ifn) .= 2 , h(h-1) (h-2) Bakr d 3 
uy Ch) =1+ ee + ake [4., + 3(h 3) 5] 5e O(K, ). (1) 


uf 
For this the rth moment of (¥/,)" can be obtained by substi- 
tuting (rh) for h in (1). The following central moments of 


(¥/k,)" are then obtained in a routine manner: 


Il 


2 
ae , ho (h-1) [2, + (3h-5) 62] + 0(K, >) (2) 
H, (h) = 2 a 2 Tar 


+ 2K 
3 2 ka 
ug(h) = 25 [b, + 3(h-1)651 + O(K] >), (3) 
7 
42 
3h"$5 b 
u,(h) = 5 + 0(K}). 
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Since Y is asymptotically normally distributed as n7* ™, by 
Mann-Wald (1943) theorem so is an appropriately normalized 


C/K)". This convergence to normality is accelerated if h is 
chosen 80 that the leading term in the expansion (3) for U3 Ch) 


vanishes. This value hy of h which approximately symmetrizes 
(x/K,)" is obtained from (3) as 
2 
ho =1- K1K,/(3K,)- 


h 
The distribution of (¥/k,) > may be approximated by the normal 


distribution with mean uy (ho) and variance HW, (ho) given in 


(1) and (2). respectively. sihatote, 


h 
Pr(¥ < y) = @[{(y/k,) ° - uhh) Woth,) I, (4) 


where 0” (ho) = HW, (ho) is given by (2). 


It is well known, e.g., see Anderson (1958) or Srivastava 
and Khatri (1979), that for many likelihood ratio statistics A 
appearing in multivariate analysis under the normality assumption, 
U = az/n is distributed as a product TX, of independent beta 
variates Xs i = 1,2,°*:,k, distributed according =to 


B(X, : aj»b;), where N is the number of observations. Equi- 
k 

valently, we have -log U = ) (-log X,) in distribution. 
i=1 


Now, it can be shown that, as a, and b, +o, -log X con- 


i 
verges in law to normality. Hence, it is possible to construct a 
normal approximation for U as described above. Towards this 
end we need the cumulants of -log Xe. The moment generating 


function of -log X, is easily seen to be M(t) = B(a,-t,b,)/ 
it 


B(a,,b;). Hence, the cumulant generating function is 


K(t) = log [T'(a,+b,)/T'(a,)] - log[I'(a,+b,-t)/T'(a,-t)]. 


a 


Differentiating and using Y¥(Z) = qZ 


log PZ), the rth cumulant 
of -log X, is 


(r-1) 


Ce bia Cham CP Ny vim ser' ve) 
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co 
But ¥'(z) =- }) (2450, giving 
j=0 
m-1 = oo 
Gey G-T)1 baa yee) Te A) - esis! 
beg!) j=m ; > (5) 
where m denotes the largest integer in b, and v = b ES 5ulp 


The cumulants of -log U' obtained using (5) are, 


k p-1 a 
KU) = DIY J (ats) 
; at 
i=1 j=0 
See tte -r -r 
+ } J {(aj43)7* - (a,titv)"}). (6) 
: ; al al 
i=1 j=m 
ce b, is an integer then the second sum in (6) vanishes and 
k m-l 
K (U') = (r-1)! YY (a,45)*. (7) 
Te ee al. 
i=1 j=0 


From (7) we observe that as either k or b, or both + %, Ky 


diverges, but Ka F > 1, are bounded. That is, b- = Kf Ky oO, 


Hence, it is possible to construct the normal approximation to 
the distribution of A as described above. Thus, from (4) we 
get 


h 
Pr(A >) = ${{A'/K,) ° - ul (hy)}/o(h) 1, (8) 
where A' = -2(log X)/N. The 100(1-a)th percentile ae can be 
approximated as 
1/h, 
Ay_y = K,[Z,0(ho) +H) (ho) ] : (9) 


where zy denotes the 100ath percentile of the standard normal 


variate. 


3. TWO APPLICATIONS IN MULTIVARIATE ANALYSIS 


The normal approximation derived in the previous section is 
now illustrated and later examined in the context of the multi- 
variate problems of testing independence between two sets of nor- 
mal variates and testing the sphericity hypothesis. 


3.1 Independence Between Two Sets. Let eS Cotas aX ) 
* 1 

Yo arene = 50O1 
and Y oe (> Yo> py Py S Po» ae | “2 Po Ps; be jointly 
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normally distributed with Var (X) = ty Var (Y) = X59 and 
Cov(X,Y) = Li" 


Yorueis Hoi 19 = 0. If S is the usual estimate of 2 based on 
a sample of size N, then the likelihood ratio statistic for 


testing. H. vais 


The hypothesis of independence between X and 


0 
N/2 
A= TPSLACS9 7 MSs 5 bet a; 
where Si and Soo are the submatrices of S corresponding 
to tad and Yoo? respectively. The exact distribution of the 


statistic A is given, and tabulated for some values of Py and 
Po» by several authors (e.g., see Krishnaiah, 1979, and Consul, 


1967a). Among various approximations proposed for the null dis- 
tribution of A two are well known and widely used in statistical 
packages such as BMDP (see Engelman et al., 1977). These are 

(i) the chi-square series approximation due to Box (1949) and 

(ii) the F approximation due to Rao (1948). 


a Lk Let «w = P,P5> m= vl - (p, +P +3)/2, 


3 2 2 
Laas w(p4+P5- -5)/48, Wees v5/2 + w{3(py +p, + 4 0u a 50(pj+P,) 


+ 159}/1920. Then, 
Pr(-m log U < 2z) 
: 2 2 2 2 
~ Pr(xy < z) + Wg ter Oe f <z)- Pr (x. < z)}/m 


if [y,(Pr(X%yg § jes Pr(x” 2 3? 


- yPrQe,, < 2) - P(X « 2)H/m' + 00%), — (10) 


2 
where Xk denotes a chi-square variable with k degrees of free- 
per 
dom and U= A : 


Rao-Approximation. Let m' = N- (p,+p,+3)/2, L = (P,P, - 2)/4, 


s = VL(psp5 - 4)/ (itp, =, 5)’. * Then, 


= (m's - 24) (2 - u/§)/(p put’), (11) 


has an F-distribution with PyPs and m's - 2L degrees of free- 


dom. 


ee 
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Now, it is well known that (e.g., see Anderson, 1958, p. 
236) under Hy the likelihood ratio statistic A satisfies 


2/N _ 


the equivalence A U= TX, in law, where X, 


(i = 1,2,°**,p,) are independently distributed according to 
beta distributions B{X,; (N-p,-i)/2, p,/2t. The normal approxi- 


mation developed in the previous section can be specialized in 
this case by taking k = Poo a, = (N-p,-i)/2, b, = p,/2 in 


the expressions (6) for the cumulants, (8) for the probabilities, 
and (9) for the percentiles of A. 


6.2 Testing the Sphertctty Hypothesis. Let X> a 


be a random sample from a p-variate normal population with 
mean [U and covariance matrix 2. The hypothesis that the p 


components of the random vector X are independent with the 

: ag j 
same variance, i.e., Hy: = 0 7 Oo > 0 unknown, is known as 
the sphericity hypothesis. The hypothesis also arises in the 
analysis of data from experiments consisting of repeated measure- 
ments. In these experiments, the measurements on a subject are 
assumed to have compound symmetry, i.e., have the same variances 
and same correlations. The problem of testing the hypothesis of 


compound symmetry Hye = o* (pI + (1-9)I) for the covariance 


structure of (p+l) repeated measurements Y can be reduced to 


the sphericity hypothesis by an orthogonal transformation 
Y'o¥' (1/V(pt1):T,) wherey ! 1s | slthe. vectort#of "1"s..~Y ° satisfies 


~ 


compound symmetry if and only if X= 7, satisfies the spheri- 


city hypothesis. The likelihood ratio criterion for the spheri- 
city hypothesis was proposed by Mauchly (1940) as 


v= A/S ~ Isl t(trs)/p)?, 


where S is the covariance matrix of the sample of size N. He 
also derived its null distribution for p = 2. The exact null 
distribution of U for p= 3, 4, and 6 was obtained by Consul 
(1967b). The 5% and 1% points for p = 4(1)10 were given by 
Nagarsanker and Pillai (1973). The series approximation due to 
Box can be expressed in this case as follows. 


2 
Box-Approxtmation. Let e = p(ptl)/2-1, f =n - (2p +p+2)/(6p) 
2 


and g = (p+2) (p-1) (p-2) (2p *+6p “+3p+2) / (288p )’ for ‘n ="N=1*% 
Then, 
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2 2 
Pr(-f£ log U < z) = Pr(x, < z) + ef{Pr(x.,, < 2) 


5 Pr (x2 x aye aoa ee (12) 


It is well known (e.g. see Srivastava and Khatri, 1979) 
that the distribution of U under Ho is the same as that of 


the product TIX, » where xX. (i = 1,2,---,p-l1) are independent 
beta random variables distributed according to B{x, 3 (n-i)/2, 


i(pt+2)/(2p)}, n = N-1. Again, we can obtain the cumulants of 
U' = - 2(log U)/N using (6) with a, = (n-i)/2, b, = i(p+2)/(2p) 


and k= p-l. Hence, the probabilities and the percentiles of 
the likelihood ratio A may be obtained from (8) and (9), res- 
pectively. 


4. NUMERICAL COMPARISONS 


The quality of the normal approximations for the two multi- 
variate likelihood ratio statistics discussed in the previous 
section and the other two approximations, was examined by com- 
puting the probabilities corresponding to the tabulated percen- 
tiles of the statistics. Thus, in the case of the null distri- 
bution of A for testing independence, the approximation due 
to Box (10), due to Rao (11), and the normal approximation given 
in Section (3.1) were used to compute the probabilities corres- 
ponding to all 5% and 1% points of A given in Pearson and 
Hartley (1972, p. 99 and 333). Similarly, in case of the spheri- 
city problem, all percentiles given by Nagarsanker and Pillai 
(1973) were used to examine the approximation due to Box given by 
(12) and the relevant normal approximation. In both cases, the 
series approximation due to Box was used in two steps: 1) only 
the first term; and 2) all terms given in (10) and (12). Also 
the percentiles approximated using the normal approximations were 
compared with the competing approximations using the first term 
of the Box series and the F-approximation. A selection of errors, 


; , . 5 : : 
i.e., (Approximation - Exact value)x10, in various cases is 
presented in Tables 1 and 2. 


Coneclustons. Let New, Rao, Box 1 and Box 3 denote the normal 
approximation, the F-approximation due to Rao, the first term 
approximation due to Box and the three term approximation due to 
Box, respectively. From Tables 1 and 2 it may be observed that: 
(i) Rao, Box 1 and Box 3 have errors in the second through fifth 
decimal place, they are especially large for small N and 
decreasing rapidly as N increases. The normal approximation 


has errors in the fourth or fifth decimal place. (ii) As Py> Py 
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- Ox (VY - antTea ‘xoaddy) = eTF}Uelted UT org 


S 
"OT x (0 - enTea *xoaddy) = AITTTqWeqord UT 10114 
ee aie Ai SP 
c- oe T €cTt 7- OLL9T*O CS SECS Le Soir ST OsZTz*0 Of 
Om So is € VEC ce 71SS0°0 £9> c9S- CE 787 S 97780°0 0c 
98- Ss vy) LO¢ Ce 777T0°O 8cc- SSI ts ar%s L62 0 CLECO-O ST 
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0 eC- 9 8S 6c- OZ07E*0 (O= 8l- | Gis 6L 82 079T7°O O€ 
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TABLE 2. Errors of the approximations for the likelthood ratto 
statistic for testing independence between two sets of vartates 


a= .05 
Pl P2 N r Errors* 
Percentiles Probatilities 
New Rao Boxl New Rao Boxl Box3 
(bet eo a Te 2 Be ee ee 
3 Sie l2 0.00001 0 0 67 -19 -1537 -4991 -4744 
8 19 0.04107 4 7 698 -17 -24 -1776 -50 
2230 0 OOL OT 0 al 356 0 -143 -4936 -3893 
DIE BE OOM PAL 0 0 760 -4 0 -3842 -725 
o Se 138.0002 L7 0 5 161 2 -218 -3340 -485 
oy 745) O'R S Yala 1 3 352 -8 -17 -1355 -16 
16 26 0.00019 0 0 50 12 -420 -4759 -2732 
16 33 0.00568 0 1 ZAlS' -6 -48 -3160 -316 
i, & 5195 .0,.00021: 0 in 37 -9 -618 -4225 -1437 
8 23 0.00356 0 s} iNe¥s} -3 -122 -2657 -199 
10 21 0.00007 0 0 18 25 -746 -4531 -2059 
Oem ASE AOS OOS 7) 0 2 81 9 -155 -3153 -358 
a= .01 
3 85 12 0.00000 0) 0 LZ 6 -478 -999 -997 
8195-0, 02261 0 7 514 0 -9 -487 -29 
22m 30% 0.00043 0 0 209 0 -44 -997 -935 
22 37 0.00990 -2 -1 Sw ie. 6 5 -872 -303 
5) 8 18 0.00085 0 3 85 3 -72 -803 -223 
8 25 0.02086 0 6 270 -2 -11 -374 -15 
16 26 0.00007 0 0 25 -10 -143 -985 -784 
Lot 333.0,00327 0 0 A 3 -11 -753 -147 
if 8 19 0.00007 0 0) 16 1 -192 -932 -513 
8 23. 0. 0017 4 0 De 81 -1 -42 -665 -102 
LOMS21 OF 00002 0 0 8 8 -234 -965 -659 
TOR 250. OOOTS 0 ik 48 5 -48 -756 -164 


Error in Probability = (Approx, value - a) x bo? 


Error in Percentile = (Approx. value - \) x nor. 
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or p increases, errors due to Rao, Box 1 and Box 3 increase 
while those due to the normal approximation either decrease or 
maintain the same level. Overall, the normal approximation is 
superior for small N and is comparable with the others when 
N is large. 
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SUMMARY. The Tadikamalla-—Johnson unlimited system based on the 
Logistic density may be fitted through the equivalence of the 
skewness (¥8,) and kurtosis (B,) in the model and distribu- 


tion approximated. One-shot accurate approximations compact 
enough for small calculators and covering an extensive domain are 
described. 
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1. GENESIS OF APPROXIMATIONS 


Ly has been described recently by Tadikamalla and Johnson 


(1979) and consists of the transformation 
2 =i 
Zosy + sinh -Y, (1) 
where Z has the logistic density 
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2 
£, (2) = exp(z) {1 + exp(z) } (-© < z < ©) 
and simple cumulative function 
=1 
F(z) 4Y + exp(=z) hts 
They give the first four non-central moments as 


-@ cosec 6 sinh Q, (6 = 1/6) 


13 
H 
I 


i] 


5 @ cosec 20 cosh 22 - 1/2, 


* = -(3/4) 6 (cosec 30 sinh 32 - cosec 9 sinh 2), 


uy = (1/2) 8 (cosec 40 cosh 42 - 2 ecosec 28 cosh 22) + 3/8, 


from which the central moments can be derived and the skewness © 


(8, = ug/up!?) and kurtosis (8, = u, his). 


The parameters y, 6 in (1) are determined from 
V8, (y,8) = vB), 8,(7,5) = B, (2) 


for a given YB, > Bos assuming a solution exists. The solution 


is completed for a working variate X by setting Y= (X - &)/A, 
the parameters €,A being determined from the equivalence of 
location and scale. Since (2) requires the algorithmic solution 
of bivariate equations beyond the capacity of calculators, with 

a consequent resort to tabulation (and possibly interpolation 
problems), other approaches seem worthwhile. 


In deriving an algorithm for the corresponding problem with 
Sis Johnson (1965) noted the linearity of 6-contours in the 


(B,> B,) plane and also the stability of the function 


{B, - 5 Ce + aus” + 3) }/B,. Bowman and Shenton (1979) have used 


the latter, approximated by a rational fraction in By 
‘ 2 P | 

to determine w (= exp 1/5") with very acceptable accuracy over 
a much larger domain than the Johnson tabulations, the approxi- 


mation being exact for B, = 0. 


and Bos 


ee 
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We follow this pattern for Ly and are led to a one-shot 


solution for 6, anda solution for y derived from this. 


2. AN EXPLICIT APPROXIMATION FOR 6 


We first of all considered the solution when By = 0 so 


that 2 = y/5 = 0, and 6 =T7/65 is to be found (see Tadikamalla 
and Johnson, 1979). The equation is 


B, = £(6), (3) 


where 


2 


ui 
£(6) = ie 6 (cosec 46 - 2 cosec 20) + ahi(s cosec 26 - $ 


As is the case for Sy (Bowman and Shenton, 1979), we now 
consider the effect of a small skewness, and entertain an expan- 
sion of B, =) £(6)!—as a, (8)B, + a, (8) 8, +--+. , The next step 
was to consider contours of g(8) = [B., - £(8)1/B,5 these turned 


out to be almost linear (Figure 1), with almost constant gradient 
with respect to Bo» in a domain By =10) 42S B, <a). and 

the log-logistic line Lo (this Ly line is almost linear pro- 
vided 7B) is less than 2 approximately). However, even if a 


simple approximation to g(9) was found, we should still have a 
transcendental equation to solve for 9. A closer look at (3) 


suggests the possibility of an expansion for 9? (not20) asa 
polynonial in By and Bs = B, = i4e25 sdfox By = 0, and BF = 0 


is an extreme point on Li for which 9§= 0. It is plausible 


then to consider the expansion 


2 = * 2 eee 
0 = 4498, + 85185 + ayo, + : (4) 


However we point out that Ly is such that the 4th moment 


involves cosec 40, so that since 9 < 1/4 for the existence of 


Bo» this shows that the extreme value of v8, | is 
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2(12 - 67 + 1) / [vr (4 ~ ny 2/2] (5) 


ny 
i} 


4.2847 «+s 4 


when B, +o, Hence an improvement on (4) is 


2. # (1) ,r,*s (2) ,r,*s 
ar Ais BiB, ie Ars BiB, 
with ae = 0 and Ae = 1, which can remain finite as B, ca. 


* 
provided there are compensating terms in B, in numerator and 


denominator. Finally then we use the approximation, 


2 si * * 


where, for i=1,2, 


The eighteen coefficients in (6) may be determined by using 
linearized least squares over a representative set of grid points 
for the triad (8, > Bos 6); we used 1000 points on a grid for 


which vB, == (00, £0: ee Oe 2006 2):5 eet Ones 
these 8 


2 we computed, for 


values, the L value 8 and used the integer part 


a 1 2L 
Of “I+ Bor incremented by unity up to 8, = 75. The coefficients 


(Table 1) show that the cubic terms are not playing a dominant 
role unless By and (or) Bo are large. 


As an obvious check on (6) one can assume BF small and 


Bo = 4.2 approximately in the moment equations, and mathematically 


derive the approximation 
2 
6° = {p(By — 4.2) + q8, H/{1 = r(B."= 4.2) + 8B} 


where p = 175/864, q = -425/1152, r = 35/648, s = 85/864. In 
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(*petTdtaqtnu eq 
03 st Aajque B3utTpuodseii0s ayy YyoTYymM Aq QT JO AeMod 9yW aaT3 ATeaTIeSsuU usye} satsajuUe OT JeYyIUeIeg) 


(9) 62296ZZZ87ET670°9 «=69(9) PEETZBZITELYTIEL*E © 0 
(7) Q9OL9OLELSVEGESOL*T- (7) LZ70909Z9TLET8O*T- ZT 
(S) OZYLO6TT6LT97ZL7°8 §=(7) B6TYSOZTLESLYYST°T TZ 
(9) TO96866Z7ZETYTIO*Z (5) 90706S6L60TZZ0S°T OF 
(2) 687E8978TTEETI?°Z- (7) SEZS88E8TSS9BLY"T- (1) 6YEOTZSO99LO0EO°T (2) OZLLY9LOTY88ZSE°9 ZO 
(Z) O€S9L78EE7T9906°8 (7) OOLT6EOSLEDY76S°9 (1) LEZTIESZOTILH8ST*Z- (1) 998466ZTT9L08Z8°I- TT 
(Z) O6TYTYTZLZ60€S8°9- (7) OO90YOTZYSTZY9T*/Z- (1) ZEVE8B8TSYLZ980T°IT (1) LSTY9OS8ZETEELTZ°I Oz 
(T) L9SS6ZVEGLTST9S*Z §©©(T) VISELTT9TY8SEZO*Z = (TT): SO6TST9LE90SHHY°9 §= (LT) OL9SLESZHETYSZ0°Z ~=- TO 
(T) L6TESYTZEESZSTS*I- (T) 6TOSZTL8Z7T8E89"E- (T) 6YZT9ESYTOLT96L°9- (T) SEEbYOVT8ESTE89°E- OT 


si sx sx Si Si 
(Z) (1) (Z) G9) 


uorqzoDdf 7OUOLY~OY iL ATAVI 
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(6) ae = -0.368918, ie =) 302542~ compared tol q = 


-0.368924, p = 0.202546, in excellent agreement. 
3. THE SOLUTION FOR 2 


sinh 2 from the 
cosec s@ for simplic- 


Knowing 9 in (2) we solve for t 
kurtosis equations. Thus, defining SF 


ity and with 6 = 6 


a 
a) 4 ie 
a= 26c, - 0 Cy» b = Oc, 1/2 
2 a2. 44 
p= 48c, - 126 C1o3 = sail WAS. C1Co - 36 cy 
q = 40c - 20c, - 90°C ¢, 5% 68°c,c, 
r= (1/2) 8c, + 3/8 - 8c,, CRD, 
we have 
Z 
t = [(2abB,, - q + VH)/(2p - 2a ‘Ay pas 
wah? 2 2 
where H=q - 4pr + 4B, (ra. + pb” = qab), 
2 
and Qi= In[t. too/(EO e 1). 


Since the kurtosis equation is now satisfied, the only source of 
error in the solution for -Q .arises, from 6 . 
a 


Numerical results show that there is serious loss of 
accuracy in (7) when vB, is small. In this case it is advisable 


to use the equation for YB, namely 


vB, = t(at? + B)/(at? + py 3/2 (8) 
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where t = sinh 2, a and b are given in (7), and 


MSPS is 

A = 60 C4Cy - 20 CG 38c., 

h = 50 ohaer asec Uren oee a (@<6) 
ee ul 4 a’ 


The obvious method to solve (8) would be to express it as a 


2 ; 
euble, in t and use an appropriate subroutine. However it 
turns out to be simpler to use the iterative scheme 


t= vB, Cat? + ye? f (ae? ty whey 9) 


with ci YB... 


This converges rapidly for small YB. 
To complete the solution, having found values for 96 and 
2, we have, defining the mean and variance of the working 9 


variate X as Vi and V5 respectively, with Uy = Uy - Uy» 


2 ¥(V5/uy) and & = va Muy 


4, DOMAIN OF VALIDITY 


The solution (8.5 2.) in (6), (7), and (9) is designed for 


the domain 8, = 0, 4.2 < B, < 75 anda slightly modified L 


1 L 


line. To define the latter, we use the approximation 


3 g3/2 


Bo = (a) eo a, vB, + a8, + a8) y/(k - vB.) 


to define 8, on L- The coefficients are 


17.64798988, a -2. 364654939, 


yy 
i] 


596129943, a 


ie) 
i] 


-0. 3554127901, 
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with k defined in (5). The numerical error in this formula 
for 4.2) < Bo < 75 is less than 0.66%, and was derived by 


least squares over 65 points for 6r= 0706(0.01) 0.7. 


* Y 
The modified Lo line (L, ) is determined by adding 2% 


to B, for a given B: 


FIG. 1: A domain of validity and contours of constant 8. (The 
shaded area ts the domatn covered by the Tadtkamalla and Johnson 
tabulatton. ) 


5. ERROR ANALYSIS 


To check accuracy, evaluate (8 > 2.) for a given couplet 
(By. Bo) in the domain of validity. Insert these solution 


approximates in the moment equations and evaluate feedback values 
Boe and Bor (Table 2). It will be seen that the solutions 


achieve excellent accuracy. 


Our assessment for the domain 8 


MID B, <S/pe 


io 
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(a) vB, > 10.2(ive J, usinga( Stor 2.) 
5 to 9 significant digits for “Bi ¢ 
8 to 10 significant digits for Bog 


(b) Occ vB, <'0.2 “(1.e., Wising (9) for 2) 
4 correct decimal digits for “Bi ¢ 
5 to 9 correct digits for Bog: 


It should be pointed out that all numerical work has been 
carried out in double precision arithmetic on IBM system 360 
model 91, and that error assessments are related to this imple- 
mentation. 


The solutions provided are accurate enough for most prac- 
tical situations and can be programmed on portable calculators. 
There will of course be a slight loss of accuracy but not serious 
if 10-digit input is used with scientific notation throughout. 


For more restrictive computer facilities the ten-point 
formula in Table 1 of 9 can be used followed by (7) and (9). 
The domain now is roughly that of the Tadikamalla and Johnson 
tabluation. Feedback values of YB» and BS can be in error 
by about 0.05%. 
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A METHOD FOR THE EVALUATION OF CUMULATIVE 
PROBABILITIES OF BIVARIATE DISTRIBUTIONS USING THE 
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SUMMARY. A technique is developed for the approximation of 
bivariate cumulative distribution function values through the 
use of the bivariate Pearson family when moments of the distri- 
bution to be approximated are known. The method utilizes a fac- 
torization of the joint density function into the product of a 
marginal density function and an associated conditional density, 
permitting the expression of the double integral in a form amen- 
able to the use of specialized Gaussian-type quadrature tech- 
niques for numerical evaluation of cumulative probabilities. 
Such an approach requires moments of truncated Pearson distri- 
butions, for which a recurrence relation is presented, and moments 
of the conditional distributions for which quartic expressions 
are used. It is shown that results are of high precision when 
this technique is employed to evaluate cumulative distribution 
functions that are of the Pearson class. 


KEY WORDS. Bivariate distribution, Pearson system, evaluation of 
distribution functions, quadrature. 
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1. INTRODUCTION 
Let f (y,,y,) represent an arbitrary joint density 
Y>¥ 74 
function (probability function in the discrete case) of the ran- 


dom variables Y, and Yo: It is desired to obtain approxima-— 


tions for values of the joint distribution function 
ide) 


be = < 
F (t,,t,)dt dt Pr{Y) < 913%, es 


(ys3 yore ae 
YY, ake wy Le YoY Deak 


where sa is the lower limit for Y, and Los possibly a func- 


tion of Yi: is the lower limit for Yo: Standard techniques 


of approximation of bivariate cumulative distribution functions 
include the bivariate normal approximation (Johnson, 1949) and 
the bivariate series of the Edgeworth or Gram-Charlier type 

(Ord, 1972; Mardia, 1970). In univariate distributions, the 
choice of the nearest member of the Pearson class leads to very 
satisfactory approximations (cf. White, 1960), if programs for 
precise evaluation are available (Bouver & Bargmann, 1974, 1978). 


If the method of approximation of the cumulative probabil- 
ities is to be generalized, a wide class of bivariate distri- 
butions should be employed. We propose to use a rather exten- 
sive family of distributions, the bivariate Pearson family (cf. 
Elderton & Johnson, 1969; van Uven, 1947, 1948), from which is 
drawn a member appropriate for the approximation. After a member 
of the bivariate Pearson class has been chesen which is nearest 
the distribution under consideration (in terms of moments up to 
eighth order), a technique of evaluation is employed which 
involves marginal and conditional distributions of the univariate 
Pearson class. The respective double integrals are evaluated by 
a specialized Gaussian quadrature technique which uses the trun- 
cated marginal distribution as kernel. Thus, all expressions can 
be obtained to a very high degree of precision (ten places for 
members of the bivariate Pearson class) utilizing merely the 
univariate Pearson programs. 


2. MARGINAL-CONDITIONAL FACTORIZATION APPROACH 


Let £(x,>X,) represent a member of the bivariate Pearson 


family. This joint density may be factored as the product of 
£,(x)), the marginal density for X)> (x, [x))5 the 


associated conditional density function of X 
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given that Xx) = x)- Without loss of generality, assume that 
Xx) and x, are standardized with respect to location and scale. 


Mardia (1970) has shown that the conditional distributions 
belong to the univariate Pearson family whenever the joint dis- 
tribution is of the bivariate Pearson class. Although, in the 
general case, a similar result has not been proven for the mar- 
ginal distribution (nor do there appear to be counter examples), 
the Pearson family contains enough joint distributions with mar- 
ginals which are of the Pearson type to justify the use of that 
(proper or improper) subclass as an approximating family. Indeed, 
successful application of this procedure only requires that the 
marginal and conditional distributions be of such nature that 
univariate Pearson members provide good approximations. 


The cumulative probability, Pr{x, < Al» X, ot, may 
be written as 
AY by 
EeAE ARE Ro Teel tay) Off £911 %q/*1) x, 4ax, - (1) 
ox.) 
Ly Pd Sn all 


The inner integral, being a function of x, and Ay» may be 


1 
denoted by H(x, A.) so that (1) becomes 
ai 
F(A, Aa) = J £, (x,) H(x,.A,) dx 


L 


ne 


Written in this way, it is evident that the integral may be 
evaluated by utilizing nonstandard Gaussian quadrature (cf. 
Lether, 1978) treating £(x,) as kernel provided H can be 


evaluated and provided the "truncated" marginal moments, 


A 


u Bas ri 
Be alg {x,] oh | x, £,(x,) dx,, 
1 hs 


are obtainable. [Here the constant divisor F(A,) is omitted. |] 


The approximation for F(A, >A.) is merely 


F(A, »A,) ~ ) W; H(x,,>A,) (2) 


where the Ws are appropriate Gaussian weights and the Xi are 


Gaussian points, all dependent on the truncated marginal moments. 
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With the marginal density function £, (x) serving as the 


kernel, we may construct a sequence of orthogonal polynomials 
{P } where 


tm in} {imel roerge%(n) (n) 
PO) =x + a1 * + +a, x + ag , 
and 
“ag O° Gt mF n 
an al gad ie Pox) PG) tts) ane 
oh Sip: is phanaiens X» ifm=n. 


The Christoffel difference equation (Hamming, 1973) defines a 
recurrence relation useful for obtaining this sequence, it may be 
expressed as 


ete, ia 
nee aptly iS She a, Pets oP aa (3) 
n n-1 
where 
n+1. . (n) ; (n) (n) 
os E{x >} y ented = | Mon He n+2 7 49 mMit1 
and 
a (n) (n) (n) 
Ao Bog th Byer Baga brs" he predias Sy ies 


Without loss of generality, P,. may be taken as unity. The 


0 
constant in P, (x) =xt ann may be determined from the relation 
oat 
+ “) (1) 2] (1) 
0 Easton ! (x + ay ) £, (x) dx = m, + ay ‘Mo: 
NE 


Equation (3) may then be used to generate the rest of the sequence. 


In accordance with the Gaussian quadrature technique, the 


Gaussian points XG are the zeros of Py Os) where N is the 


number of points used in the summation of (2). 


To determine the weights W 


A, . 
Ey Fo P_(x)} = J f(x) P(x) dx = Me WEL 
‘ =1 


? consider 


EE —————— 
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for n= 0,1,°°+,N-1. We obtain the following system of linear 
equations: 


A N 
al 7 =O and L WwW. P yy a 0, for n = 1,257 4GNeL. 


Solving this system for the weights W, requires only simple 


synthetic division (Bargmann & Halfon, 1977). One aspect of this 
procedure is that if H were a polynomial of degree less than 
2N, then the formula (2) would be exact. Thus, given the 


truncated moments Ey {xi} up to order 2N-1, the weights Wy 
1 
and the points X14 may be determined. 

We have assumed that Xx, and Xx, are standardized random 
variables and that the inner integral of (1), H(x,,4,), repre- 
sents the distribution function of a Pearson distribution whose 
type is determined by the conditional moments, E{X,|x,}, 
sue eli2 354. H(x, 5A.) may be expressed in the form of the dis- 


tribution function of a standardized univariate Pearson variable 
as 


b 
H(x,,A,) = J £5), (z|x,)dz 


Where, a= [2, (x,) - Modo Ia and b= [A, - Mop) ]4 4 


and where £5), (21x) represents the standardized conditional 


density function and where and o denote the condi- 


nie 2|1 
tional mean and standard deviation, both depending explicitly on 
x.. We may further write, denoting the conditional c.d.f. by 


* 
abl 
Ay 7 Hota 
BGs, 45) + Pot fo) ; Bi lx ; Bo |x 
2\1 1 1 
where 8 and 6 are the third and fourth standardized 
1|x, 2 |x, 


conditional moments, and thus emphasize the dependence on the 
first four conditional moments. Therefore, (2) becomes 
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} alaba LEE (4) 
FCASE AS) = W, F* ——_——— 3; 8 eg es 4S 
dee” ee ee Oo Ix Lag ea 
where the W, and XG are dependent on A, through the trun- 
1 


cated marginal moments. 


In the next two sections we shall detail the methods for 
obtaining the required truncated and conditional moments using 
marginal and mixed moments of the distribution whose probabilities 
are to be approximated. 


3. TRUNCATED MARGINAL MOMENTS 


Let X have a Pearson distribution with probability den- 
sity function f(x). The truncated moments (again with the 
constant divisor omitted) are given by 


dE 
E,{x'} = f x" f(x) dx, where & < X. 
ve 


When r= 0; E,{x"} is just the cumulative distribution function 


F(T), while for other values of r, a recurrence relation may 
be established to provide values for the truncated moments. 


Cohen (1951) derived such a recurrence: relation for doubly 
truncated Pearson distributions. As a special case, in a form 
slightly different from Cohen's the relaticis are 


= 


E,{x} = (rAz)t-£00 cogteyte yt” - (a-c,) F(T)] (5) 


E,{X } = ae [-a" Fc cegteyt4e,1 


+ cy (r-1) E(x” “} ~ (a-c)r) ep (x] (6) 


where a, Cor Cy» Cc, are the coefficients of the Pearson differ- 
ential equation 


pdt as xa a 
Padx 


EE 
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These coefficients may be expressed in terms of the moments of 
X (Elderton & Johnson, 1969). In the case where the mean 


(E(X) =u) is zero, expressions for these coefficients are 
(Johnson & Kotz, 1970) 


@ c= 448. g- 38, )u,/D, a= ch YB, (B,+3)V1,/D, 


and 


(eo) 
I 


> = (2B, - 38, - 6)/D, (7) 


where D = 108, - 128, - 18 and where, with one Ei (<1) }, 


2 
/ and B, = u,/us. If the origin is not taken at 


3 
7B, = u,/u, 
the mean, say for the random variable Y= X+y, the truncated 
marginal moments may be expressed in terms of those of X as 
follows. 


is G - Yr, r-i i 
Epty } = Ey_y{(xty) } = eye By_ix } . 


In addition, X has a Pearson distribution whose type is deter- 
mined by YB, and B,- Hence, given moments of X to order 


four, the values of a,c f(T). and R(T) . may all 


‘ies elect ig 
be obtained. The latter two values are available through numer- 
ical techniques and computer programs such as those by Bouver and 
Bargmann (1974). The recurrence relation (6) will thus provide 
all orders of truncated moments required for producing Gaussian 
points and weights for the marginal distribution £,(x,)- 


4. CONDITIONAL MOMENTS 


Let the joint distribution of x and X, be of the 


Pearson class. Mardia (1970) has shown that the regression of 

X, on X) (or Xx, on X,) is linear. This idea may be 
extended to show that E{x, |x, } is of degree s in x It was 
noted earlier that the conditional distribution is Pearson, so 
that £9 )1 Ixy) satisfies a differential equation of the form 


Of fakha| Xa) 
1 of WE Beas dart 
£5) Qe) Ox, bs xe) 


where L and Q are linear and quadratic functions, respectively, 


oO 


of x) and Xy° Rearranging and multiplying (8) by =i then 
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integrating each side over the range of Xo> we arrive at 
s 0Q bs s-l E. s 
-E{X, ax, |x,} - s E{X, Q|x,} = E{x, L|x,} (9) 


Letting L= ay + a,x) te an%y and Q= bo 1 b x) + box, 
Z 


+ b,,x.x, + b = and expanding (9) 


eee atu Phan OM oh 


s s stl 
-(b,E{X, |x, } + b 1 5¥E{X,|x,} + 2b, ,E{X, |x, }) 


s-1 s-l s 
- s(b E{X, “|x, } + b,x, E{X, |x,} + b,E{X, |x,} 
2 s-l1 s st1 
+ by x EtX, |x,} + by 9*, ELX, |x, } + b, .EIX, |x, 1) 


s s st1 
a E{x, |x, } + a,x, E{x, |x,} + a, E{X, |x,}. 


When s=1, it is clear that E(x) |x,} is a quadratic function 
of “x. .. 4With es. = 2. E{X>|x,} is a cubic function of x and - 


if 
with s = 3, B(x, |x, } is a quartic function of x The Mardia 
is 


L 
result, i.e., that E{X, |x,} is a linear function of x) 
merely the special case for s = 0. Thus, quartic functions 
were chosen to provide conditional moment approximations since, 
if f£ is a member of the bivariate Pearson family, these quartic 
functions would be the exact moments. 


The (r,s)th mixed moment can be expressed using the marginal- 
conditional factorization regardless of whether the joint distri- 
bution is Pearson. We have 


Cas ig s 
E{X)X,} f x, f, (x) { x, £ (x, |x, )dx,dx, 


r s 
rf x, £, (x) E{X, |x,} dx, 


where the limits of integration extend over the domain of the 
joint density function. Expressing E{X, |x,} as a quartic 
function of X1> which in the case of non-Pearson distributions 
may represent an approximation, we have, writing Wg for 


rs 
E{X,X,}, 
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a f aie tae + ie ata wie)? +e wih) 4 ag wh sch }ax, 


E 
l 


rs 


+ (s) 


as eget (s) (s) (4) 


Moen +707 3° Udo (ay aaeO 


Wo Hr 
Letting r= 0,1,2,3,4, we obtain a persymmetric system of equa- 
tions that may be solved to obtain the conditional moment weights 
Wis (different from the weights Ws of (4)). This system in 


matrix form is 


Hoo = Yro -¥20") U30%° -F40 “> Mos 
Pion "30. 30. °C 750 “au Mis 
20 30, 20 1 Do i at Baer 
Pa 60 50. 60.170 “3” N35 
MGde SO cae 70) 80 “> Mas 
or, V w) = miei Letting s = 1,2,3,4, we get four such 


matrix equations that may be combined into the single expression 
VW =M where the (i,j) element of W is denoted by wl? 


and that of M is Wy So the conditional moment weights 
ee > 


wis) can be expressed as W = va 


~ ~ 


M, W being the solution to 


the linear system. The sth conditional moment approximation is 
simply 

(s)_2 (s)_3 (s)_4 
aur ; 

1 Wo x) of Ww x} + wy x (10) 
These approximations may be used to determine the appropriate 
Pearson type for the conditional distribution and its limits, 
and, hence, permit the evaluation of the conditional cumulative 


distribution function, PSL? of (4). 


B(x) = wi?) + we?) 


5. ILLUSTRATIONS 


Example 1: Dtrtchlet. The pdf of the bivariate Dirichlet distri- 
bution with parameters OF: 8,5 and 8, is given by 
r(6,+6,+6 6.-1 6,-1 6,-1 
fy yy (%y°¥2) = _ Tt OF ) xi yo (h-yy-¥,) 
pes? i Z 3 
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where 0< Yaes LAOs< Sort 1-y,, and O° 6 0, 240. * This 


2) 
is a member of the bivariate Pearson class. Noncentral moments 
of the Dirichlet may be expressed in terms of ascending factorial 
powers as 

(iy C3) 
9, 8, 


kj (0, +8,40,) 079) 


where oe = 0, (0, +1) +++ (8) +k-1). For any given values of 


01> O55 and O55 moments may be computed so that the persymmetric 
system (10) may be constructed. Taking, for example, 0-2, 
6,=3; and o,=1, the matrices of equation (10) are then 

| 1 1/3 1/7 1/1469 5/826 

1/3 1/7 1/14 5/126 1/42 

a iri 1/14 5/2607 cally 42 1/66 

DVLRY . SPIDG* Ai PAR 1/66 1/99 

5/1260 1/42 1/66 1/99 1/143 
and 1/2 2h 5/28 5/42 


Lid 1/14 5/126 1/42 
Mie 3/56 1740 1/84 1/154 
T/42 271/805 41/231 1/462 


L{8h°  S/231 5/9972 5 ey B0GG 


Thus, W= VM is 
3/4 3/5 1/2 ShT 
-3/4 -6/5 -3/2.. .-12/7 
W = 0 3/5 S/'2ae Ley? 


0 0 =1/2.. 5-13/7 


ee 
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The columns of W provide the coefficients for the conditional 
Moment approximations (11). Since the conditional distribution 
of YI is, in fact a Type I Pearson member, i.e., beta, with 


parameters 8, and o. and domain 0 < Yo < 1-y,> the condi- 


tional moment approximations are seen to be exact. 


In Section 3, the expressions (7) for Cos Cys Coo as and 


bie recurrence formula (6) were given on the basis of 
U5 = O. As noted there, by letting xX, = yy - U9 and 
al 


S = T- 1/3, the truncated marginal moments of Y, may be 


expressed as follows. 


Kes fey 1 yk 
Css = Epo! ec + Uj 9) } 
10 
k 
k k-i i 
= ) Cr: ) E 1 {xX } 
azo 2 10 T-Hy 6 sk 


i} 


k 
k, ly k-i 1 , it 3 
na ()@ | 14; (i+1) ts *) 20T(1-T) 


t,4,._1,2), 1¢- i-2 
, E We eee at © iE ay 1)E, {x} } 


+ anes | + &)* netx}} 


where, of course, E {xe} =E {y°} =F (T) = I(T; 2,4), an incom- 
5. a ae: Y 

plete beta function, here. (In the general case, this quantity 

is obtained as the cumulative distribution function of the appro- 

priate Pearson member chosen on the basis of Boo and Boo") 


The marginal distribution is Pearson, beta with parameters 01 


agate! (Se ae o.; so the truncated marginal moment expressions are 


2 
exact. 


In view of the exactness of the moments, error occurring in 
the approximation of Dirichlet probabilities (indeed, those of 
any member of the Pearson class) using the marginal-conditional 
factorization technique would be closely related to that intro- 
duced as a result of the Gaussian quadrature routine and to the 
precision with which the Pearson distribution functions are 
evaluated. The former is largely determined by the number of 
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points used in the approximation. Here, for instance, F(.7, 3) 
equals .1323 (exact) while the 3, 5, 7, 9, 11, 13, and 15-point 
approximations yield, respectively, 2, 4, 6, 7, 9, 11, and 

13 significant digit precision. For F(.3, .1), the 3-point for- 
mula yields .899988E-3 versus .0009 exact, while 11 digit accur- 
acy is attained with only seven points, .8999999999988E-3. 

The precision of the approximations for F(.8, -1) ranges from 
only one digit with three points to five digits at nine points 
(.639997 versus .64 exact). Using the programs of Bouver and 
Bargmann (1974), in which ten place precision is guaranteed, the 
accuracy of the univariate Pearson distribution function values 
proved to be quite adequate in all cases considered (see Parrish, 
1978). There was no indication that error in the double integral 
approximation was linked to evaluation of the univariate Pearson 
distribution functions. Indeed, for Pearson members, the 
precision typically obtained (eight to ten places or more) seldom 
required more than eleven Gaussian points. 


Table 1 contains cdf values for an additional bivariate 
Pearson distribution, Type IVa (c.f., Johnson & Kotz) with 


parameters 0) =4 and. 6, = 7. The first entry is the approxi- 


mation and the second is the exact value of the cdf for the 
indicated standardized values. The precision for these points 
varies between seven and ten places. 


Example 2: Rayletgh. Consider the bivariate Rayleigh distribu- 
tion with parameters n and 0 (c.f., Johnson & Kotz, 1972). 
Each marginal distribution is the chi-square distribution with 
n-l degrees of freedom, thus the truncated marginal moment 


approximations will be exact. The distribution of Y = s,/(1-p") 
given Ss) aga is the noncentral chi-square with n-1l degrees 
of freedom and noncentrality parameter p°s,/(1-p"). Hence 

r 
E{S,|s,} = (1-p*)* -E{y"|s,}. Further, for the standardized 

; a 2 
variables, X, = (S;-u,)/9,, where u; = n-l and oO; = 2(n-1), 


the conditional moments may be expressed in terms of those of the 
noncentral chi-square, 
Yr 


— r + "ak s cone 
E{X, |x, } =O . bo Gu a-p*)* 462334. efy™ * [utox, } 


Letting n= 25 and p = .5, the coefficients for computing 
conditional moment approximations using the marginal-conditional 
technique with standardized moments are given in the matrix below. 


oO lee err eee 
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S6LT9TST7666° ELTT8788T686 ° 


6C7C7TET7666° CSOT8C88T686° 0°8 

VLOT6LESC666° 79L6CLC9T686° TL788SSEST88~ 

S8798ltS7666° €6C8E8C9T686° O09688SSESTS88° G36 

VLEOOSSEES86° 90SCEV80ST86° 8708929SE088° L086720969TY° 

9LC66LSEE686° O8EEE780ST86° OVOL9L9SEN88° LT86720969TY" OE 

TTLLS8€T67EL° L07289876672EL* €88T770T8769° T7COS98EETEE” T-4786S89CS69ET* 

CILLS8ET67EL* 6028987662EL° 788177018769" S7COSIBEE TEE * T-478098970S69€T* S* 
Gn7 ine GPL t p= I TV 

cv 


Done T 
Earn =~ 8 
*($a1dqUe puoves) san]pa yoDKNAa puD (8a1c,Ua 4ysa2f) suo1qzDUIxorddo fp vAT adAL :L ATEVL 
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0 15/16 9¥3/32 1539/512 / 


1G eye 99/128 17173/256 


We 0 1/16 373/64 117/256 
0) 0 1/64 373/128 
0 0 0 1/256 | 


For the Rayleigh distribution our approximations to cumulative 
probabilities were found to be consistently accurate to five or 
six decimal places using as few as three points, with only minor 
improvement observed as the number of points increased. For 
example, approximations for F(-1.5, -1.5), F(2,1) and F(5,3) 
were, with seven points, .004301998, .8206635, and .9937327 
compared to exact values of .00430254, 820658, and .993735, 
respectively. This should be contrasted with the rather poor 
precision obtained by a bivariate Edgeworth expansion (Aiuppa, 
1975). Similar results were obtained for other parameter values 
(see Parrish, 1978). In the case of n = 3, p = .5, where pre- 
cision might be expected to decay, the approximation provided 
three to six digit accuracy. Inasmuch as the moment evaluations 
here were shown to be exact and in light of the behavior of 
cumulative distribution function approximations for various 
values of an the error encountered must be attributed to the 
approximations of the conditional cumulative distribution function 
values. In fact, by replacing the conditional distribution func- 
tion approximation with exact conditional distribution function 
values and using five point Gaussian quadrature, the bivariate 
cumulative probabilities were obtained to 12-place precision. 
The standardized conditional moments (Johnson & Kotz, 1970) of 


2 : ¥ 
S,/(1-p ) given S, = 8, are 


It may be shown that 0 < 38, = 4B - 9 and 28, - 38, =n bane 


which, when given strict inequality in the latter, indicate that 
the Type I Pearson is the member used to provide the conditional 
cumulative distribution function approximations in this example. 
E. S. Pearson (1963) has considered the use of the Type I Pearson 
distribution as an approximation for the noncentral chi-square. 


ee ee 
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Selected comparisons are presented in Table 2 for the Rayleigh 
distribution. 


The bivariate class is in no way restricted to members in 
which the conditional distribution belongs to the same type of 
univariate Pearson class, for the different Gaussian points. 

Since the evaluation programs for the Pearsonian distributions 
select the appropriate type automatically (on the basis of the 
third and fourth standardized moments), this is of no concern. 

For non-members of the bivariate Pearson class, exact conditional 
moments are replaced by quartic approximations (since quartics 
produce exact conditional moment values for members of the Pearson 
class). Of interest here is whether the conditional distributions 
remain of the same type for all values of the marginal variate. 

If this presents a restriction of the bivariate Pearson class 

in general, then the technique described here would be utilizing 

a rather more extensive class since, when the approximation is 
performed, the conditional distribution is in no way restricted 

to remain in the same Pearson class type for the different 
Gaussian points. 
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TRANSFORMATION OF A DISCRETE DISTRIBUTION TO 
NEAR NORMALITY 


FABIAN HERNANDEZ 


University of Wisconsin 
RICHARD A. JOHNSON 
University of Wisconsin 


SUMMARY. Utilizing an information number approach, we propose 
an objective method for the normalization of either discrete 
distributions, or sample counts, by means of a power transfor- 
mation. Approximations are also given to the original known 
probabilities. Next, we derive the large sample distribution of 
our estimate of the power transformation. We compare our methods 
with the Box-Cox procedure, applied to observed counts, and con- 
clude that their technique often provides good approximations 
even though their underlying assumption of normality is clearly 
violated. Two examples illustrate our methods. 


KEY WORDS. Transformations, discrete distributions. 


1. INTRODUCTION AND SUMMARY 


The transformetion or 're-expression' of counts is now 
common practice for the data analyst. Tukey (1977, p. 83) speci- 
fically mentions some advantages of transforming counts. Our 
procedure selects a 'normalizing' transformation from the family 
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2og(X), A = 0 


considered by Box and Cox. It provides an objective way to 
determine i. 


In Section 2, we discuss criteria by which discrete random 
variables, or their transformations, may be judged to be nearly 
normal. Employing the Kullback-Leibler information number, we 
introduce a transformation technique that applies where the true 
underlying distribution is known. Also, we develop a discreti- 
zation of the normal distribution that approximates a given dis- 
crete distribution. 


Sample analogues, of the methods in Section 2, are developed 
in Section 3 and their asymptotic distribution derived. These 
can be used to obtain approximate confidence intervals for the 
"best' transformation parameter. 


For comparative purposes, we show that the Box-Cox technique 
leads to sensible results, provided that we add a positive con- 
stant. Section 5 includes the re-expression of counts and the 
normalization of raw test scores. 


2. TRANSFORMATION AND APPROXIMATION OF A KNOWN 
DISCRETE DISTRIBUTION 


2.1 Normal Approximation to Transformation of Smoothed Discrete 
Random Variables. Let X be a discrete random variable which 
takes value i with probability Pee P[x=i] for 2 3.0; 


Then Y = X+U is absolutely continuous when U is independent 
of X and is uniform on [c,c + 1] some fixed c>0O. Let Y 


have p.d.f. g(+); V= yA) havea p.d.f. &) (+) and or 


be the p.d.f. of a normal distribution with mean pw and standard 
deviation o. In our search for a transformation, we replace the 
discrete variable X by the absolutely continuous Y, and then 
select a power transformation of Y. 


Employing the Kullback-Leibler information number between 
8) and Pio? as a measure of closeness, we propose to minimize 


eter ae ee 
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8, (V) 
Tle, 3%, = a Log og), (2) 


with respect to wu, Oo and A. Minimizing (2) first over wu 


and 0, we find that the optimal value A, of 2d is found by 


minimizing 


G(A) = %4[Rog(27) +1] + E, {fog lg (x) 1} Ga Ce es! 


+ ttogiv, (x9) (3) 
provided that BO) and B, [Log (X) 1° are finite. Here 
v(x) = etx - (x6?) 1? and EC) denotes that the 


expectation is taken with respect to g. 


2.1.1 Proposed procedure for transforming a known discrete dis- 
trtbutton. Replace X by the absolutely continuous random 
variable Y = X+U, where X has the given discrete distribution 
and U, stochastically independent of X, has a uniform dis- 
tribution on the interval [c,ctl), for some c> 0. To 


"normalize' X, we make the p.d.f. of yO) "closest' (in the 
information number sense) to a normal p.d.f. by minimizing (2) 
with respect to (u,0,A). 


Example 1. Let X have the Potsson distribution with parameter 
OCpeLeie =’ = st0 With Uw il.2) (.e., c=L)... The function 
G(-) defined in (3) becomes 


G(A) = const - X }) {(1+1)Rog[1+1/(1+1)] + Log (2+1) }P) (4,0) Bys\ 
i=0 


2A+1 2A+1 


+ 3og{ ) (1+i) {[1+1/(1+i)] = 1}P,(4,0)/(1+ 2a) 


i=0 


A+1 


~{) asaya asatayy - 1p 4.0/0) 
=) 


- Log(]A]) 


for iX #.-, -1. Here 
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const = [og(27)+1]/2 + a[Log(a)-1] 


co 


+ J} {og(i!) + (1+i)Log[2+i)/(14+i)] + Log (2+i) }P, (4,0) 
i=0 


and P (4.0) = a exp (-a)/i!. 
In Figure 1, we plot G(A) vs A for a = 2,3,4 and 5. 


According to our criterion, the usual variance stabilizing 
transformation (See Bartlett, 1947, p. 41), A=, isa 
reasonable choice since it makes |G(A,)-G Cs) | "small' for 


these values of a. In Table 1, we record ry» the information 


number of the transformed variable TE) 39.04)? the informa- 

tion number of the untransformed variable Tteys®, S B rte ats, 
vv 

and a = a+ 1/12. We also include the information number 


T [8.59 se Cs)ox Cs) corresponding to the square root transforma- 
tion. 

2.2 Normal Approximation of Discrete Probabilities. Alternatively 
we can approximate the probability P; by a probability qy 


obtained from a normal c.d.f. To measure the accuracy of this 
type of approximation we utilize the Kullback-Leibler information 
number in its discrete population version (see Kullback, 1968, 

p. 128). 


Let de(0,1). We propose to approximate P, by 


P (A) sito uk) 
a, = 4,0) = oy - gf SO g(a) 
where (+) is the c.d.f. of a standard normal distribution. 
mE EA 
Here do = of EV and if Rahs Os for eto ie 
(X) 
ee Lan pies 2H ‘ 
yom at a * J.“ bet!’ Pt= {p, £9390} ahd 


dee aw tt pegs ea qy defined in (4). We often take d= .5. 
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G(A) .350 


a=2 
.300 

az3 

a=4 
.200 az5 
.100 
000 


.O 4 8 1.2. 1.6 20 A 


Fig. 1: The funetion G(:) for the Potsson distributton. 


TABLE 1: Transformatton and compartson of tnformatton numbers 
for the Potsson distributton. 


] 


a r I[g, 39 1? sekloae | pore e a 
* EA Ga ae 4s? “Uy. Cs) 0, 7s) 
2 .36889 . 033815 . 08104 . 035925 
= 48816 . 019620 .05047 .019641 
4 .54318 . 013090 .03572 . 013308 
5. . «57540 .009729 .02753 . 013167 


a 
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2.8.2 Proposed procedure for the normal approximation of pro- 
pbabilities. Let de(0,1) be fixed. Minimize the Kullback- 
Leibler information number 


P 
LIPO) = py bes | ay 
ry ps) ei 
Oo 
fo) Pp. 
+ ) p, Log Q) = C) (5) 
ial (itd) %-p,_,,(i-d) Sp 
o[ e ]-$[ . ] 
with respect to 0" = HANS Gty la 


3. TRANSFORMATION OF COUNTS AND NORMAL APPROXIMATION 
OF OBSERVED PROPORTIONS 


3.1 Transformatton of Counts. Let X be a discrete random 
variable taking the value i with probability P, = P[x=1] > "0, 
for i = 0,1,°°°,N < ~s@het Xo sX be i.i.d. as X and 


denote the frequency of the value i by f Set tri = the 


aah 


indicator of eet SO he 


n 
= i f 
a ne Tr4}%) is the frequency 


of [X=i] and the relative frequency is 


Pia £5 ,/? for f= 0,1, 4°° N- (6) 


Once having observed Xp»Xo0°""sX 9 we treat these as possible 
values of a random variable Y where Ya now takes the value 


i with probability ae Next, construct 
Le = 2 + U. Wilts ph died B0) (7) 


where +a and a are stochastically independent and u. has a 
uniform distribution on the interval [8,8+1), for some fixed 
(A) 
> sd = . . e °. . 
Bias Oe emnties te We a have p.d.f g 6 )° and oan ) J bes the 
p-d.f. of a normal distribution with mean wu and standard 


deviation oO. 


Now, suppose we want to transform the data so that they 
appear to come from a normal distribution. We propose to trans- 
form Lae instead of the original observations Ks» IVESC1R.0* -3n 
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and to select a transformation ., in such a way that the dis- 
A 
tribution of vs ) is 'closest' to a normal distribution in the 


Kullback-Leibler information number sense. That is, we minimize 


g (Ww) 
an, = oe tool | (8) 


with respect to U, Oo and X. Thus, minimizing first with 
respect to wu and o, it is easily shown that the optimal 
value, Aen? of 2 is obtained by minimizing 


GOA) = *[£og(2m) + ge hea he ae eee ae ae 


(a) 
b Wosli (70d (9) 


In summary, 
-8.1.1 Proposed procedure for transforming counts to near 


normality. Having observed KpoKor tts Bys introduce the dis- 


crete random variable Ya which takes the value i with pro- 
n 
‘ ped 
bability es oy 1/4} %,). Replace vy by the absolutely 
continuous random variable Le = Se + us where is indepen- 
dent of Yn and is uniform on the interval [8,8+1), some 
fixed 8B > 0. In order to 'normalize' the observations, we 


transform ve employing Procedure 2.1.1. 


The transformation i is selected by minimizing (9) with 
respect to A. 


‘ 2 ' 
The asymptotic behavior of 0% = Up? Sen? dan? » e-the 


vulue of 0 that minimizes (8), is given in the following result. 


Theorem 8.1. Let xX, be a discrete distribution with Rea 


Piya), i.< Ny. “and W = (xX + yy Nave ped. ih. B,(*) where U 


~ 


is uniform and independent of X. Set 6' = (9, 8,593) = (u,0,A) 


and suppose that the following conditions are satisfied: 
i) The parameter space 9 is a compact set given by 


© = {0=(u,0,A)'"| |ul<M, cxo<d, axAxb; -o<a<0<b,c,d,M<o}. 
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195) H(8) = T[8)39,,] has a unique minimum at 


oe (Hy sF, Ay) Ce 6. 


~ 


Then, 


L)) 78am San = Oy with probability one. 
n->co 
If we further assume that 


iii) 6, is an interior point of © and 
& 2 
iv) V-H(8 )= eaten: H(8) ), is non-singular, 
~* 00,00, Se 
7 es o=o. 


~ 


Then 
d i 2 “1 
2) vn(®,.-8,) —> N,(0,VWV') where V = [V'H(8,)] ~ and 
W =[E[C, (X, ,8) Cy (X58) 1] with 


N  i+f+1 yA) 


-u, 2 
Cp Ky8) rode fo CANS dv Tey), 
i=0 i+B 
N i+8+1 (A) 
gtk sal Ve =Ly2 
Pee aimee ) oe gu a Ty spp 


=0 i+f 


; ef Oa yy 6) Pe: 
C,(X,,0) = (eo SS - Rog(v)] dv I,.,(X,). 
By Mimasany tigate ry {abe 


Proof. See Hernandez and Johnson (1979). 


Remark. Theorem 3.1 says that an converges, with probability 
one, to 9, the value of 6 that minimizes the Kullback-Leibler 
information number between 8)> the p.d.f. of (v+u) O | and a 


normal p.d.f. Hence, Procedure 2.1.1 can be interpreted as the 
infinite-sample analogue of the current technique. 


3.2 A Normal Approximation to Observed Proportions. We want to 
approximate observed proportions (6) by a set of normal prob- 
abilities Q(6) = {q, (8): 0< i<N} given by (4). 

3.2.1 Proposed procedure to approximate observed proportions. In 


order to approximate Po = {P,. : O< i< N}, by a collection 
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of normal probabilities (4) we minimize the Kullback-Leibler 
information number 


“a 


T[p. ; Q(6)] = , aSregregh wants (10) 
ell ion Pin “98 q, (8) 


with respect to 6. 


Theorem 3.2. Let X assume the value i with probability Py 
fon” fie=jO-1s55%,N, where) N®pis“finite. ““Let Xperrek be . 
i.i.d. with the same distribution as X and ae be the 
observed proportion of the value i. Set 6' = (8, 585,84) 
= (u,0,A) and assume that the following conditions are satisfied. 
i) The parameter space © is as in Theorem 3.1, 
N P. 
ii) F(@) = I[P;Q(8)] = b p, dog —+~ has a unique 
z = a nates t q, (8) 
i=0 is 
minimum at G5: 
Then, 6. the value which minimizes (10), satisfies 
1) Sim 6 = 6. , with probability one. 
no 
If we further assume that 


aie) 85 is an interior point of O and 


iv) VF(8,) = F(6) ), is non-singular, 


then 


e -1 
2) Fa(6,-8,) > n,(0,0WV") 6 Se re [VF (8,)] And 


N 0 Log q, (9) 0 Log q, (9) 


W = re Ore a | 
aie 26, 38, i 


Proof. See Hernandez and Johnson (1979). 
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4. A COMPARISON WITH THE BOX-COX PROCEDURE APPLIED 
TO DISCRETE OBSERVATIONS 


In our model [X=0] may have positive probability so, we 
consider Y=X+c with c> 0.5. The Box and Cox (1964) 
method selects » by maximizing 


n —~ 2 
a (A) = - B[fog(2m)+1] - 2 Lost } py} fat 
i=1 


n 
+ (A-1) } Log (¥;) (11) 
i=l 


with respect to A. 


Let. Y,.=.% + U_ where. U is independent of X and is 
uniform on [c-, cts]. Procedure 2.1.1, the large sample limit 


of Procedure 3.1.1, selects A to minimize G(A). Asymptotically 


the Box-Cox approach, applied directly to the discrete observa- 
tions, requires the minimization of a function - $(A) and 


G(A) = -¢(A) + constant + Error 


where 
N ites 
Error=i ) p, f  [og(itc)-Log(y) ]dy 
. Lars 1 
i=0 itc-%- 
- khog 
itcts 
When f [Rog(y)-Log (ite) ]dy| < 4 ——— (13) 
ieee itc- 


and, ‘for “ry="Liy2 


itcts 
Xr = 
Faacee ta iaa tte TEAL (i4e-3)7* Livin sherman tran 


1+c-5 


are small, we would expect the Error to be small. Consequently, 
the Box-Cox procedure and Procedure (3.1.1) should give nearly the 
same answer when the sample size is large. 


Ee 
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From (13) and (14) making c large appears to improve the 
agreement between the two procedures. The most frequently 


employed transformations /X+c, oe Rog (X+c) and Cae 
satisfy rd<l. 


5. APPLICATION TO THE RE-EXPRESSION OF COUNTS 
AND NORMALIZATION OF TEST SCORES 


Example 2. Tukey (1977, p. 572), displays the following counts 
for the duration (in days) of incubation for 1663 eggs of the 
ridley turtle 


days SOL 752 99554 55 56°57 58-59 60-61-/7- 64> 65 


number 
of eggs 


PR EBZ2°TOSC32L 725° RS0C162 21 9141 6. 19°14 LSet 


He suggests the re-expression Ydays-49. 


Setting 8 = 0, we apply the Proposed Procedure 3.1.1 to 


s = 1 
the above data and obtain ®% 1663 (B35 701 393 08 Laie 


Moreover, using the limiting distribution of On? derived in 


Theorem 3.1, we can establish an approximate confidence interval 
for X,. The estimated standard error of ae 1663 is 
> 


7¥6.015/1663. Hence, an approximate 95% confidence interval for 
Ay is (0.697, 0.933). Notice that A = 4% is not included in 


the interval. 


Example 3. Ghiselli (1964, p. 78), proposes the use of the 
square root transformation for the normalization of the 100 test 
scores. We set 8B = 0. The application of the Proposed Procedure 


3.1.1 yields 0, 100 > (4.66, 0.54, 0.070). Utilizing the 


limiting distribution of On? we obtain the estimated standard 
error of 5. 100 and then approximate 95% confidence interval 

> 
(-0.30, 0.44) for i,. Figure 2 presents a comparison of the 
relative frequency histograms of a) the transformed scores and 
b) the original scores. We also applied the Box-Cox procedure 
to the above scores. The estimated power transformation is 


“a 


dX = 0.078 which is in good agreement with the value 


As 100 = 0.070. 


The sample procedures introduced in Section 3, can also be 
applied to situations where the observations can only be ordered. 
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6 .02 
.4 
Ol 
he 
3 4 5 6 20 60 100 140 
(a) (b) 


FIG. 2: Relative frequency histogram of (a) the transformed 
scores and (b) the ortginal scores. 


That is, the observations can be assigned to exactly one of the 
categories 1,2,°°*,N. Here N corresponds to the highest 
category, N-l to the second highest, etc. With this scoring, 
we are able to apply our methods to the relative frequencies of 
the categories. 


REFERENCES 


Bartlett, M. S. (1947). The use of transformations. Btometries, 
Be 39-52. 

Box, G. E. P. and Cox, D. R. (1964). An analysis of transforma- 
tions. Journal of the Royal Statistical Soctety, Series B, 
26, 211-243; discussion, 244-252. 

Ghiselli, E. E. (1964). Theory of Psychological Measurement. 
McGraw-Hill, New York. 

Hernandez, F. and Johnson, R. A. (1979). Transformation of a 
discrete distribution to near normality. Technical Report 
No. 546, Department of Statistics, University of Wisconsin. 

Kullback, S. (1968). Information Theory and Statistics. Dover, 
New York. 

Tukey, J. W. (1977). Exploratory Data Analysts. Addison-Wesley, 
Reading, Massachusetts. 


[Recetved May 1980. Revtsed October 1980] 


MULTIVARIATE DISTRIEUTIONS IN RELIABILITY THEORY 
AND LIFE TESTING 
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SUMMARY. Multivariate parametric distributions which are of 
interest in reliability theory and life testing are discussed. 
These include distributions with exponential, Weibull and gamma 
univariate marginal distributions. Other distributions of inter- 
est are the multivariate nonparametric distributions whose mar- 
ginals have increasing failure rates (IFR), increasing, failure 
rate averages (IFRA), are new better than used (NBU) or new 
better than used in expectation (NBUE). Also mentioned are 
univariate and multivariate processes which have associated 
with them distributions in the various nonparametric classes 
mentioned above. 


KEY WORDS. multivariate exponential distributions, multivariate 
Weibull distributions, multivariate gamma distributions, multi- 
variate exponential extensions, shock models, threshold model, 
gestation model, characteristic function equation, multivariate 
IFR, multivariate IFRA, multivariate NBU, multivariate NBUE. 


1. INTRODUCTION 


In this paper we discuss the various multivariate parametric 
and nonparametric classes of distributions which are of interest 
in reliability theory and life testing. 


In Section 2 we concentrate on parametric distributions 
whose univariate marginals are either exponential, Weibull or 
gamma. We also discuss multivariate exponential extensions which 
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are distributions whose marginals are not generally exponential 
but which were formulated utilizing concepts based on univariate 
exponential distributions. 


Multivariate nonparametric classes are discussed in Section 
3. These classes contain multivariate distributions which have 
increasing failure rate or increasing failure rate average or 
are new better than used or new better than used in expectation. 
Many formulations have been given for each of these classes, so 
that in this section only the most recent or the most established 
formulations are discussed in detail. 


The paper concludes with a discussion of univariate and 
multivariate stochastic processes which are related to the non- 
parametric classes to which Section 3 was devoted. 


2. PARAMETRIC DISTRIBUTIONS 


2.1 Introduetton. The univariate parametric distributions 
which have been most useful in reliability theory have been the 
exponential and Weibull distributions. Others which have been 
of some importance are the lognormal and the gamma distributions. 
A history of the use of these distributions is given in Chapter 
1 of Barlow and Proschan (1965). An introduction to these dis- 
tributions, models from which they arise, their properties and 
their use in reliability theory are contained in Mann, Shafer 
and Singpurwalla (1974), and in Barlow and Proschan (1965, 
1975). See also Bain (1978). For more detailed expositions on 
these distributions and comprehensive bibliographies see Johnson 
and Kotz (1972). 


Multivariate parametric distributions which are analogs of 
the univariate distributions previously mentioned are still being 
developed. Unlike the multivariate normal distribution, for 
most other multivariate distributions having marginals of one 


type there are many possible dependence structures and consequently 


many multivariate versions. Several multivariate exponential 
and related distributions, for example, have been developed. 
Many of these, along with their properties, are given in Basu 


and Block (1975) and in Block (1975). See also Johnson and Kotz 
(1972). 


2.2 Btivartate Exponenttal and Related Distributions. Most of 
the distributions treated in this section have multivariate 
analogs, but for the purposes of clarity and exposition we will 
treat the bivariate case first. As with all the distributions 
discussed we will say a distribution is a multivariate "---" if 
all of its univariate marginals are "---", Therefore a multivar- 
iate exponential distribution will be one whose univariate 


ee 
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marginals are exponential. We will say a distribution is a 
multtvartate exponenttal extension if it is derived from prop- 
erties of a univariate exponential. The univariate marginals of 
such a distribution will not necessarily be exponential. 


Freund distrtbutton. Freund (1961) introduced one of the first 
bivariate distributions based on a model involving the exponen- 
tial distribution. An interpretation of this model, given in 
Block (1975), is the following. Consider a two component system 
where the failure of one component affects the lifetime of the 


other component. Let Y and Y, have independent exponential 


distributions with means ant and ai respectively (the initial 


distributions of the unaffected component life-times). Let Ys 


and yy be independent of Yy and Y, and have independent 


exponential distributions with means Ce and i: (the 


distributions of the affected components). Then it is easily 
shown that the distribution of the component lifetimes 


' 2 < 
vee Pacta te) seep ge a ie 
(X »X) = 
1 ' . 
(Yo t¥ 5 Y,) act Y, < Ya: 


has the survival function 


Qa 
il i Bet aK) 
a. 40,,-05 exp[-(a,+0,,-a,)x) - 4x5] 
a. - a, 
= < 
+ a,ta,-a) exp [ (a, t0,)x,] if x) Xo» 
. -s (1) 
peso) 0, 
re: 1 = tie. ' 
aL, Fa. exp[-a)x, - (0,+0,-0))x5] 
Ot Oty 
= i < 
+ a ta,-01 exp [ (a, t0,)x, J af x} Xy. 


Properties of this distribution are given in Johnson and Kotz 
(1972). 
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Marshall and Olkin and a related distribution. The distribution 
of Marshall and Olkin (1967a) has survival function 


F = > 
F(t,>t,) P(T, Puit asf YE t,) 


ite 


a exp[-A, t,- Ayton A, gmax(t, ot >) 


> > 
for ty 0, t, 0, 


where Aye do» Ay are nonnegative. This distribution is 


derivable from 1) a fatal shock model, 2) a nonfatal shock model, 


and 3) a loss of memory model. For details on these models and 
for properties of this important distribution see Johnson and 
Kotz (1972) and Barlow and Proschan (1975). Estimation and 
testing for this model have been carried out in Bennis, Bain 

and Higgins (1972), Bhattacharya and Johnson (1973) and Proschan 
and Sullo (1975). A more general version of this distribution 
is given by Marshall and Olkin (1967b). 


A recent characterization of this distribution due to Block 
(1977a) is that (T,>.T,) has this distribution if and only if 


T, and T, are marginally exponential and min(T, ,T,) is 


exponential and independent of T,_- Other characterizations 


1 ae 
of this distribution are given in Galambos and Kotz (1978) 
and in Basu and Block (1975). 


A distribution which is closely related to the Marshall and 
Olkin distribution has been studied by Block and Basu (1974). 
It is also closely related to the Freund distribution and can be 
obtained from the interpretation of Freund's model given pre- 
viously where the effect of one component on the other is a 


straing 1.er. Oy < Oss a, < Q,: This distribution is obtained 


from (1) with the choices 


ne teat 
Qa, A, + Ayohy/ A, tA,)> a, = +2 


where hie dos dio are nonnegative and AL + hy >0.- Lt turns 


out that this distribution is also derivable from a loss of 
memory model similar to that of Marshall and Olkin. Furthermore 
this distribution is the absolutely continuous part of the 
Marshall and Olkin distribution. It should also be noted that 
the lifetime of the two-organ system of Gross, Clark and Liu 
(1971) and also of the two-organ subsystem of Gross (1973) is a 
special case of the maximum lifetime of this distribution. See 
the original paper for details, estimationand other properties. 


oe rset pee 
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Estimation and testing have also been carried out by Mehrotra 
and Michalek (1976) and by Gross and Lam (1979). The latter 
authors consider an application of this distribution to bivar- 
iate relapse times for patients receiving different treatments. 


Friday and Patil distribution. Proschan and Sullo (1974) briefly 
suggest a model which incorporates both the Marshall and Olkin 
and the Freund distributions. Friday and Patil (1977) pursue 

the idea of a distribution containing these two distributions 
still further. They develop a similar although more general 
distribution than that of Proschan and Sullo which is derivable 
from (1) a threshold model, (2) a gestation model and (3) a 
warmup model. The survival function can be written as 


F(x, %5) = Ot oF, (x, 2X5) + (1-019) Fg (x, 2X5) (2) 


where F(x, »X,) is given by (1), i.e., the Freund distribution, 


and 


Fy (x »x,) = exp [- (a, +0, )max (x, »x,) ] tf x, > as x, > 0. 


It is clear that for hy = 1 equation (2) gives the Freund dis- 


tribution. It should be noticed that this equation is of the 

form of the Marshall and Olkin distribution (see Theorem 1.5, 

Chapter 5 of Barlow and Proschan (1975)). The choice of para- 
= a p + 

meters 9 QA, AMA, Anthi9)> where A> Ay and hoo 

are positive and O1> Gs Oy» a, as chosen for the Block and 


Basu distribution, yields the Marshall and Olkin distribution. 


Also contained in the paper of Friday and Patil is a nice 
summary of the relations among the various distributions just 
discussed. Also discussed are transformations from independence 
for the distribution and computer generation and efficiency for 
this distribution. 


Downton distrtbutton. This distribution is a special case of a 
classical bivariate gamma distribution due to Wicksell and to 
Kibble. See Krishnaiah and Rao (1961) for a discussion of this 
gamma distribution and references. Downton (1970) developed a 
model which gave rise «o this bivariate exponential distribution 
and proposed its use in the setting of reliability theory. 


An interpretation of this model, which is due to Arnold 
(1975b) is presented here. Consider a two component system where 
the components are each subjected to nonfatal shocks which occur 
according to two independent Poisson processes. Assume the 
processes have rates (1-p)/u, and (1-p)/u, respectively where 
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0<p<1,0<u,, O0<u,: Let {Xj tcdcm gh pti vege pred cre 1d 92 


represent interarrival times for each of the two processes. 
Assume that each component fails after N, and Ny shocks 


respectively where Ny = Ny =N is a geometric random variable 
nat 
with parameter 1-p (i.e., P{N=n} =p ~(1-p), n = 1,2,°°"). 


The times to failure of the two components are then given by 


Ny No 
opty (3 bay cnr (3) 
een? res) lt i=1 Z) 


By conditioning on N and using characteristic functions it is 
easily seen that (Y5¥,) has density 


VaTE 
UH, UY toy Z PL HoV1Y5 
ep Ig fp 


Seale 
Yo) = Sp 1-0 (4) 


for yy 32) (O)e Yo > 0, where Ig is the modified Bessel function 


of the first kind of order 0. This is the bivariate exponential 
distribution given by (2.10) of Downton (1970). 


Downton also shows that if instead of Ny = N, an we Gap wes 


derivation (and equivalently in the above derivation) we let 
(N,>N.) assume various other bivariate geometric distributions 


a distribution of the same form as (4) is obtained. 


As mentioned initially the above distribution can be obtained 
as the special case of a particular bivariate gamma distribution. 
This distribution is obtained as follows. Let (X)¥,); 


(Xo sYo)orr+, (KX 5¥) be iid bivariate standard normal distribu- 


tions with correlation op. Then 


has a correlated bivariate gamma (chi-square) distribution. For 
the case n=2, a distribution of the form (4) is obtained. 


Hawkes distrtbution. The bivariate exponential distribution of 
Hawkes (1972) is obtained from the same model as that of Downton. 
The only difference is in the choice of the bivariate geometric 
distribution (N,>N,)- This bivariate geometric distribution 
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which was derived by Hawkes was derived independently by Esary 
and Marshall (1973) and by Arnold (1975a). The model given here 
is from the Esary and Marshall paper. 


Consider two devices which receive nonfatal shocks at dis- 
crete time periods labeled by the positive integers. The occur- 
rence of these shocks is independent and at each cycle there is 
a probability Pi that both devices survive the shocks, 


probability Pig that the first survives but the second does not, 


probability p that the second survives and the first does 


O01 
not and Poo that both devices fail. Then letting N, be the 


number of shocks to failure for devices i= 1,2, and condition- 
ing on the occurrence of the first cycle, as in Arnold (1975b), 
it is easily seen that the characteristic function o(t,,t,) 


satisfies 
it_t+it 
AB 


OEysty) = © “[Pggthg 9(0,t,)4P,g9(t, 0) +P, H(t, +ty)]- (5) 


This is essentially the characteristic function equation of 
Paulson and Uppuluri (1972b) and is easily solved (i.e. take 
t= 0 and solve for $(0,t,), then for $(t, »0) similarly, 


then for o(t,,.t,)). The bivariate distribution which has this 
characteristic function is given by 


n Ag! 


i 20 
i < 
> > = A 
P(N, >n,, N, n,) = Ae (6) 
p 2G +p_.) Le ie nh, Segal 
TESS aba 2 il 
Using this distribution in the model of Downton, Hawkes in 
slightly different notation (i.e., Pay = Pyi for all 
i = 0,1, j = 0,1) obtains his distribution. By taking 
Poi = Pig = 0 it can be seen that N, = N, and so the Downton 


distribution is a special case of Hawkes. The resulting trans- 
form is given in Hawkes (1972) along with some properties. 


Paulson distribution. Paulson (1973) derives a bivariate expon- 
ential distribution through a characteristic function equation. 
This equation is the generalization of a one-dimensional charac- 
teristic function equation which arises from a compartment model 
(see Paulson and Uppuluri, 1972a). A generalization of the com- 
partment model also leads to the bivariate equation. 
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The bivariate equation is given by 


$(t,.t,) = V(t, »t,) [Pog + P93 9(0,t,) + Py gh(t, 0) +P, 1O(t, +E) ] 


= < d 
where Poot Pot Prot Py = 1» Payot Pir.* +2 Pou Pag <2 
¥(t,.tp)= [C1 = 46;t,)G - 10,t,)]"}. Then solving for 
p(t, >t.) in the above equation leads to tke bivariate character- 


istic function 
-1 
p(t, ,t,) = [(1 = i0jt)a = i8,t,) = P44) [P99 


; =k F -1 
+ Pi (t = in, t,) ts Poy (2 No i,t.) ] 


Te . Lf 
where H, = 8; (Po9 + Po) and eit 9, (Po 4" Pi” 


It can be shown that this is exactly the form of the Hawkes 
distribution and henceforth we will refer to this distribution 
as the Hawkes-Paulson distribution. For properties see Paulson 
(1973) and Hawkes (1972). 


Arnold classes. In describing these classes Arnold uses what 

he calls a generalized multivariate geometric distribution. It 
is easily seen that this is a reparametrized version of (6) in 
the bivariate case. Thus we let (N, >N,) ve the bivariate dis- 
tribution given by (6). Then Arnold's bivariate classes et 


consist of the random variables 


pe ne 
Sigler (i rae } “) 


where (X51 >X; 5) for’ i =41,2,°": are bivariate iid rvs win 


distributions in € for n> 1 and where ee 


onsis ‘f 
oe, consists o 


(X,X) where X is exponential. Clearly the marginals are 
exponential for all the classes. It also is not hard to show 


that ee contains the pair of independent exponentials (see 


Arnold, 1975a) and also contains the Marshall and Olkin distri- 
bution (see the nonfatal shock model discussed in Barlow and 
Proschan, 1975). Furthermore, it follows from the derivations 
given here of the Downton and the Hawkes distributions that these 


(2) (2) 
2 et 


are contained in e¢€ since 


contains the independent 


exponentials. 
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The Arnold classes of distributions have been described 
using the characteristic function equation approach of Paulson 
and Uppuluri (1972b) and Paulson (1973) in Block, Paulson and 
Kohberger (1975). In this latter paper the characteristic func- 
tion equation approach has been used to derive properties of the 
distributions in this class, including descriptions of the 
standard distributions in the class, infinite divisibility of 
the distributions, moment properties and asymptotic properties. 
These results are summarized, without proof, in Block (1977b), in 
which it is also shown how the distributions in the class lead 
to multivariate shock models of the type studied in the univariate 
case by Esary, Marshall, and Proschan (1973). 


2.8 Multivartate Exponenttal and Related Distributions. Most 
of the bivariate models in the preceding section have multivar- 
iate (n > 3) analogs. In general the ideas are similar to the 
bivariate case, but the notational complexity is greatly 
increased. Without giving many details, we will briefly discuss 
the multivariate situation. 


The Freund distribution has been generalized to the multi- 
variate case by Weinman (1966) but only for identically distri- 
buted marginals. See Johnson and Kotz (1972) for details con- 
cerning this distribution. Block (1975) has considered a gener- 
alization of the Freund distribution for the case when the mar- 
ginals need not be identically distributed as well as general- 
izing the Block and Basu (1974) and the Proschan and Sullo 
(1974) models in the same paper. 


Generalization of the Downton (1970), Hawkes (1972) and 
Paulson (1973) distributions implicitly exist within the frame- 
work of the general multivariate gamma distribution of 
Krishnamoorthy and Parthasarathy (1951) (see also Krishnaiah 
and Rao, 1961; Krishnaiah, 1977) and also within the framework of 
the Arnold classes. A specific parametric form has been given in 
Hsu, Shaw and Tyan (1977). 


Recently a multivariate exponential distribution has been 
proposed by Bryant (1979) which arises in the context of certain 
cycling systems. 


2.4 Multtvartate Wetbull Distrtbutions. Multivariate Weibull 
distributions were discussed by Marshall and Olkin (1967a) in the 
context of their discussion on multivariate exponential distri- 
butions. Specifically they define a multivariate Weibull distri- 
bution by assuming Cree eat yn’ has their multivariate exponen- 


tial distribution and then considering 
1/a, 1/a, 1/o, 
WR svete dT), = 5 ELT 7 
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where O. > 0 for i= 1,°*',n which then has univariate 


Weibull marginal distributions. This procedure certainly could 
be extended for any multivariate exponential distribution. 
Moeschberger (1974) has studied bivariate Weibull distributions 
of this form, deriving properties and discussing maximum like- 
lihood expectation. 


David (1974) and Lee and Thompson (1974) have introduced 
multivariate Weibull distributions of the form (T).°°*»T) where 


ips min(U, 3 £2 darid)s edit Uke fh, ctepnd, P{U, > x} 


a ‘ 
= exp{-A, x Leh x > 0 and Uy are independent. These distri- 


butions need not have Weibull marginals if the a, are not all 


equal. Arnold (1967) has also considered Weibull distributions 
of a similar form, but his restriction that they belong to an 


additive family forces a, = a for all J. Thus these distri- 


butions are also of the form (7). 


Recently Spurrier and Weier (1979) have modified the Freund 
model using Weibull instead of exponential distributions. 


3. NONPARAMETRIC CLASSES OF DISTRIBUTIONS 


Various classes of distributions which describe the way in 
which component lifetimes wear out have been discussed by many 
authors. The most important of these classes are 1) the 
increasing failure rate (IFR) class, 2) the increasing failure 
rate average (IFRA) class, 3) the new better than used (NBU) 
class and 4) the new better than used in expectation (NBUE) class. 
These have been extensively discussed in the literature. The case 
where the lifetimes are independent (which we call the untvartate 
case) are discussed in the book of Barlow and Proschan (1975) and 
also in the expository paper of Block and Savits (1980). The case 
where the components are dependent (called the multivariate case) 
has also been discussed in the latter paper, but since the develop- 
ment in the field is so rapid many new results have appeared since 
this last mentioned paper. We give a brief introduction to the 
univariate case, some background in the multivariate case and 
then outline the most recent developments. 


3.1 Untvartate Classes. The most prominent of the nonparame- 
tric classes used in reliability theory is the class of distri- 
bution which have increasing failure rate. See Block and Savits 
(1980) for background and motivation for this class. In the 
following we let T be a random variable with distribution func- 
tion F(x) such that F(0) =.0 (the usual assumption is 
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F(O ) = 0, but for the purpose of exposition we use the above) having 
density f(x) (if it exists). We say F has increasing failure 
rate (IFR) if the survival function F(x) = P{T > x} satisfies 


F(x + t)/F(x) decreases in x > 0 for t > 0 (8) 
or equivalently (if the density exists) 
r(x) = £(x)/F(x) increases in x > 0. 


A distribution F has tnereasing fatlure rate average (IFRA) 
ab 


F(at) 3 F°(t) for all O<a<¢1 andall t20 
or equivalently (if the density exists) 
cher r(x)dx) “increases in t > 0. 
Another ets wate formulation (see Block and Savits, 1976) is 
co co 
f n° (x/a)dF(x) 3 {f h(x)dF(x)}* for all O<acl1 (9) 
and ett nonnegative Merron functions h. 
A distribution F is new better than used (NBU) if 
F(x +t) < F(x)F(t) for all x > 0, t 30 (10) 


and is new better than used in expectation (NBUE) if 


co 
f F(x) dx < uF(t) for all t> 0 
t 
co 
where WU = f F(x) dx is finite. 
0 
Dual versions for all the above definitions exist by 
reversing the monotonicity or the inequality. Since the treat- 
ment of these concepts is similar we shall omit it. 


3.2 Multivariate Classes. 


As in the parametric case many multivariate extensions are 
possible. Many versions of multivariate IFR, IFRA, NBU and NBUE 
have been proposed. For various IFR and IFRA extensions see 
Marshall (1974) and Esary and Marshall (1979) respectively. In 
the following we discuss the particular multivariate IFR and IFRA 
notations which at this time appear to be the most important ones. 
Various multivariate concepts of NBU and NBUE are given in Block 
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and Savits (1980). As of yet, no clear’ favorites have emerged 
but there have been several recent papers on this subject. We 
shall attempt to describe some of this development. In the 
following we let F be the multivariate distribution of the 
random variable T = (T)>**+sT,) which is assumed to satisfy 


F(0) = 1. 
The distribution F is said to be MIFR if 
F(x + tl)/F(x) decreases in x 2 0 for allt >0 (11) 
where 1 =i ener) 


This generalizes (8) and has many important properties (see 
Block and Savits, 1980). Various variants of this condition are 
possible (see Marshall, 1979) but the above version best captures 
the intuitive idea of the model that components in the same 


environment run for the same time (i.e., t) = Cog tn = t) 


but may be of different ages (i.e., x = (x),°*+5x,))- This con- 


cept also satisfies many of the important basic properties that 
one would expect of such a multivariate generalization. See 
Chapter 5 of Barlow and Proschan (1975) for the statements and 
proofs of these properties. 


The concept of multivariate IFRA which we now discuss is a 
generalization of (9). We say F is MIFRA if 


E'[h(T)] < E[h’(T/a)] for all 0<a<1 


and for all continuous nonnegative increasing functions h. 
Recall that F is the df of T. A distribution satisfying this 


condition has all of the properties one would expect of a general- 


ization of the univariate IFRA concept. See Block and Savits 
(1979b). 


Esary and Marshall (1979) have proposed various other con- 
cepts of multivariate IFRA, many of them having intuitive appeal. 
Unfortunately all of them fail to satisfy at least one of the 
basic properties which the MIFRA distributions possess: This 
is demonstrated in Block and Savits (1978b). 


Block and Savits (1980) describe a wide variety of possible 
definitions for both the concepts of multivariate NBU and multi- 
variate NBUE. Some of these were definitions given by Buchanan 
and Singpurwalla (1977), others were based on the multivariate 
IFRA concepts of Esary and Marshall and still others were based 
on Laplace transform characterizations of NBU and NBUE which 
appeared in Block and Savits (1979a) and on other 
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characterizations of NBU and NBUE which appeared in Block and 
Savits (1978a). At the time the paper of Block and Savits (1980) 
was written the only thing that was clear was that many multi- 
variate NBU and NBUE concepts were possible. Since then some 
order has begun to appear in this field. 


Marshall and Shaked (1979) introduce a compelling concept 
of NBU. This definition is that a random vector T = (Lio geaae J 
n 


~ 


is multivariate NBU if 


P{T € (A+ B)A} < P{T € aA }P{T é BA} for’all ai>.0,°6'> 0 
(12) 


and all upper (or increasing) sets A in ee (A is an upper 
Set cit. (ox, 6A, and x<y imply y eA). It is clear that this 


is a general version of the type of definition studied by 
Buchanan and Singpurwalla (1977) and very recently by Ghosh 
and Ebrahimi (1980), i.e., 


F(x+y) < F (x) F (y) forsall x >" 0 and “y > 0 (13) 


where x and y are perhaps further constrained. Furthermore 


Marshall and Shaked have four alternate characterizations of (12). 
Several of these involve the concept that T is multivariate NBU 


if and only if g(T) is univariate NBU for all g of a certain 


type. Many other properties are proven. 


The classes of multivariate NBU and NBUE introduced by 
Buchanan and Singpurwalla (1977), where the NBU distributions are 
defined by properties which are cases of (13) and the NBUE dis- 
tributions are integrated versions of these, have been further 
studied by Ghosh and Ebrahimi (1980). These authors study the 
relationships among these definitions (and some variants of them), 
their properties and also demonstrate how some multivariate shock 
models give rise to them. They also make connections with some 
of the IFRA concepts. Griffith (1979) has also considered multi- 
variate shock models leading to some of these concepts. 


A recent paper by El-Neweihi, Proschan and Sethuraman (1980) 
discuss the multivariate class which arises as minimums of inde- 
pendent univariate NBU random variables. These distributions 
arise in the same way as the Marshall and Olkin distribution and 
also in the same way as one of the definitions of multivariate 
IFRA (i.e., Condition C) of Esary and Marshall (1979). These 
have properties similar to those of distributions with exponential 
minimums studied by Esary and Marshall (1979). Various relation- 
ships and properties are given. One of them is given that 
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= (7); eh a, has this NBU property then it has the Marshall 


Ae an at Pipgten if and only if, for example, min T; is 

l<ic<n 
exponential. Relationships are given between the present 
definition and various other definitions. 


3.3 Processes. The concept of a one dimensional stochastic 
process being of a type described by one of the four classes has 
been proposed by Ross (1979). Essentially he has discussed 
processes which are decreasing (or increasing) and whose first 
entry times into a state are all IFRA. For these processes, 
which he calls IFRA processes, he proves a closure theorem. He 
also studies NBU processes. NBU processes are also considered 
by El-Neweihi, Proschan and Sethuraman (1978) who also prove a 
closure theorem. 


Extensions of Ross's ideas to multivaraite processes have 
been accomplished by Block and Savits (1979c). These authors 
study several types of multivariate processes having IFRA type 
properties. 


Recently Arjas (1979) has considered IFRA processes. He 
has discussed both the univariate and multivariate cases. 
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SUMMARY. A new "scalar" definition of a (contour) multivariate 
hazard rate based on variation of probability distribution across 
the isoprobability contours is motivated and described. Several 
examples involving specific multivariate distributions justifying 
the usefulness of this definition are presented. General struc- 
tures of "constant" contour multivariate hazard rates as well as 
increasing contour multivariate hazard rates are described. 


KEY WORDS. hazard rates, isoprobability contours, logarithmic 
transform, power transform, multivariate distributions, exponen- 
tial distribution, characterizations. 


1. INTRODUCTION 


Numerous devices and algorithms for generating multivariate 
distributions have been developed over the past few decades. Some 
of these are briefly sketched below without commenting on their 
relative merits. 


1) Perhaps the earliest method was to formally generalize 
a system or equation defining a univariate distribution to the 
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multivariate case. The papers extending Pearson's system of 
distributions, e.g., van Uven (1947), Sagrista (1952), and Styen 
(1960), can be cited as examples of this approach. 


2) The most direct approach - modelling - is presented in 
the works of Freund (1961) and Marshall and Olkin (1967a,b), 
whereby a multivariate distribution is obtained by indicating a 
specific stochastic model comprising a system of several "compo- 
nents" representing the random variables under consideration. 
The multinomial distribution is a primary motivation for such a 
construction. 


3) Another method defines a multivariate distribution by 
explicitly specifying the mathematical relations between the joint 
distribution and its marginals. This was initiated by Fréchet 
(1951) and extended by Morgenstern (1956), Gumbel (1960), Farlie 
(1960), and Johnson and Kotz (1975a), among others. 


4) Still another approach postulates a specified multi- 
variate form for the density by reproducing in the multivariate 
setting the functional form of a particular univariate family of 
densities. This includes linear-exponential type distributions 
originated by Bildikar and Patil (1968), quadratic-exponential 
type by Day (1969) and multivariate 6-generalized distributions 
of Goodman and Kotz (1973). 


5) Recently, Higgins (1975) has shown that given any 
"reasonable" set of contours generated by the relation t(V) = y, 


Te R. y real, and any continuous univariate density g, there 
exists a (unique) multidimensional density h with corresponding 
r.v., say X, such that the density of t(X) is g and the 
isoprobability contours of “h are determined by t(V) = y. Thus, 
a transformation is displayed which maps pairs (g,t) into a 
large class of multivariate densities. Higgins' procedure yields 
a variety of multivariate distributions. However, in this "semi- 
modelling" approach, it is necessary to extract from the resulting 
multivariate distribution, the possible hidden dependencies 
between the component variables. This task is seldom easy, but a 
promising avenue is to trace the relation between the joint hazard 
rates corresponding to the multivariate distribution and the rates 
of the marginal components (see, e.g., Johnson and Kotz, 1975b). 


A related method to determine the interrelation among the 
component variables of the multivariate distribution is to 
investigate how certain characteristics of a distribution, in 
addition to the density itself, change when a univariate distri- 
bution is extended to a multivariate setting. One such character- 


istic is the "hazard rate". This problem is investigated in the 
present paper. 


ee 


a 
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2. MOTIVATION OF THE DEFINITION 


Based on Higgins' (1975) results and our previous generali- 
zations of univariate generalized normal distributions using con- 
tour concepts (Goodman and Kotz, 1973), we shall present a new defin- 
ition of hazard rate for both a univariate and multivariate 
setting. 


The "ordinary" univariate hazard rate for a random variable 
(r.v.) with density f and distribution function F is defined 
as 


-i=ddlog(leF(t))) f(t) 
es dt Rte) 2 


for all values of t for which the denominator is positive. Con- 


versely, 1-F(t) = exp{-D, Bice eit 20. 


Multivariate generalizations of hazard rates have been pro- 
posed by Harris (1970), Basu (1971), Brindley and Thompson (1972), 
Block (1973, 1977), Johnson and Kotz (1975b), and Marshall (1975), 
among others. The last three generalizations are based on vector- 
valued definitions. 


Here, we base our univariate as well as multivariate defin- 
itions of hazard rate on isoprobability contours (or sets). 


First we present a result closely related to Higgins' original 
theorem (1975). We shall obtain Higgins" original result with a 
uniqueness property and a partial converse to his theorem (see 
Remark below). 


Theorem 1. Let n_ be any fixed positive integer. Let t:D>R, 


DCR" be any nonsingular function, i.e., dt(x)/dx + 0 for all 
x€D, such that Va hs vol{x|x€D and t(x) < y} is finite, nowhere 


> 
constant in y and everywhere differentiable in y, for each y 


in range (t). (We assume range (t) to be some finite or infinite 
interval.) Let g~ be any continuous univariate nowhere-zero 
density over range (t). Then there exists a unique multivariate 
density h = $(t), corresponding to r.v. JU, say, such that: (1) 
the set of isoprobability contours of h is the same as that of 
ts. tls Groa0f) the form {x| t Gx) = constant}; (2) the r.v. t(U) 
has the univariate density %; and (3) the specific relation 
between h, t, and g is 


h(x) = o(t(x)) = 8(tGs))/(D, ee he = Git (1) 
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Remark. Let h be a multivariate denstty over DCR with the 

same properties of those of function t stated in Theorem 1. 


Let t be a monotonic function, ges of hes tnent. tl) the 
set of isoprobability contours of h is the same as that of t 
and (2) the rov.  t(Ujashas density og determined by the 
following integrals (where U is a r.v. possessing density jc 


h(s) dv 
g(y) - {xo ds = o(y): = , (2) 


| 


where the integration is over the set {s|t(s) = y}. 


Sketeh of Proofs. The essence of Theorem 1 is Higgins' result. 


Uniqueness follows from the additional requirement that vy = is 
> 
nowhere constant (equivalently ie y> > 0). The Remark 
> 


follows from equation (2) using the result 


dv 1 
a) ey ee 
dy dt (s) = 
ds | 


with the integration over {s|t(s) = y}. 


We now define, for a given multivariate density h, the 
concept of a hazard function. According to Theorem 1, a pair 
(g,t) uniquely defines h. However, given h, we have in 
general an infinite class of corresponding pairs (g,t) which 
are all compatible (in the sense of Theorem 1) with the given h. 
In order to eliminate the ambiguity, we must choose a particular 
t, which in turn uniquely determines g. The problem of non- 
uniqueness of the density g (via different choices of &(x)); 
is not unlike the one encountered in Bayesian inference when 
different choices of the prior distribution lead to different 
posterior distributions and, consequently, to estimators with 
different structural properties. A similar situation holds in 
connection with ancillary statistics (see, e.g., Fu, 1974). The 
devices used in Bayesian inference for choosing a natural (or 
"conjugate'") prior compatible with the distribution at hand can 
readily be adapted in our case. 


Basically, we are dealing with four types of multivariate 
distributions: 
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1) Those of the general exponential type, which includes 
linear exponential families (Bildikar and Patil, 1968), 6- 
generalized multivariate distributions (Goodman and Kotz, 1973), 
and various exponential multivariate distributions, e.g., Gumbel 
(1960), Freund (1961), Marshall and Olkin (1967). The natural 
choice for t(x) for this family is the monotonically decreasing 
logarithmic transformation t (x) = -log(h(x)/c), where c is 
an appropriate constant - affecting only the location and scale 
of the resulting univariate density g. This transformation is 
meaningful in the sense that it yields the informatton content 
(or the log-likelihood) of the random sample X= (X).Xo0-+-+sX) 


distributed with density h(x). The basic function of a trans- 


form is to assure the finiteness of the volume Lire = in 
> 


Theorem 1. (A conceptually analogous device is used by Fu, 1974.) 


2) The second class of multivariate distributions is of the 
"power-type" form which includes multivariate t, multivariate 
Pareto, multivariate Burr, etc. These distributions are also 
defined on an infinite domain. Here, the natural choice for 
t(x) is the monotonically decreasing power transform which does 
not distort the functional form of the multivariate density h, 
as far as the variable and the parameters are concerned, but 
simplifies the form of the constant part related to the location 
and scale parameters. This transform also does not affect the 
information contained in the sample X= (X r++e9X) of the 


multidimensional variable with the p.d.f. h(x), and serves to 
guarantee the finiteness of the volume Vie - 


3) The third class of multivariate distributions has the 


form x exp (-Bx”) defined on positive orthants, as exemplified 
by the multivariate gamma distributions. This case can also be 
dealt with efficiently using information-content-type monotonic 
decreasing transformations t(x) = -log(h(x)/c). In these cases, 
however, closed expressions for the ordinary hazard rate as:well 
as for the contour hazards are not usually available, although 
the functions are computable using numerical techniques. 


4) Finally, we have distributions on a finite domain cor- 
responding to the Beta-type distribution in the univariate case, 
and to Dirichlet-type distributions in the multivariate case. 
These distributions do not require any transforms to assume the 
finiteness of the volume and the natural choice here is t(x) = 
h(x)/c. As in case 3), closed expressions are not always avail- 
able but both hazard rates are computable. 
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We shall now present the definition and several examples of 

distributions with the corresponding contour hazard rates and 

compare them, in univariate cases, with the "ordinary" rates. 


3. THE BASIC DEFINITION 


Let h be any given multivariate (or univariate) density 


n ‘ 
over some appropriate domain DCR subject to the restrictions 

of Theorem 1, with corresponding r.v. U. Let t be a monotonic 
(increasing or decreasing, depending on the form of h) function, 


i of h. Let the density of t(U) be denoted by g. Via 


Theorem 1 we have the relations 


h(s) l dv. 
4 : z toy 
g(y) Se ae) ds = $(y) if dt (s) ds on (> ee ); 


ds 


with the integration over {s|t(s) = y}. Denoting by G the 
cumulative distribution function corresponding to g, we define 
the (relattve) contour hazard rate functton Q as the "ordinary" 
hazard rate of the univariate density g, i.e., Q(y) = 


g(y)/(1-G(y)). 


If the random variable with the density h(x) is restricted 


to per and represents joint life-times of a system con- 

sisting of several components, then Q(y) can be viewed as the 
rate of mortality on contour level y, gtven that all the com- 
ponents have not all failed within the n-coordinate time system 
up to level y. Usually, the most efficient way of computing g, 
and hence Q, is to determine Ls : (and dv. y/ 4Y) directly 


> 
(rather than in surface integral form) and ehehe _ the relation 


qv, 
sky) -=,o(y) saga" 


4, EXAMPLES AND COMPARISONS 


4.1 Exponenttal-Type Multivariate Distrtbuttons. 


a) De Stmoni's (1968) family of multivariate distributions is 
given by 


sk 
h(x) = e+ en GW AG pT? 
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where c>0O, A is a positive definite matrix, ueR", a>0o0. 


Accordingly, t(x) = -log(h(x)/c) = [(x—u) ACx-p) 1? and the 
volume is i 


Vejy = voltg|t(x) < y} 


t 
= vol{x| (x-u)"MG-p) < y!/4} 
Sole Oo n/ za 
are ttn) 
dv n/2 
ate Be, eo ay oO eri) TS) / 28-1 at 
Thus epite 35 a T(4+n/2) y » and @¢(y) = ce a 
Consequently, 
dv. 
sty =O) aaa 
- lace sei ae Pe cpa Caco mae w Ns) 
NA 
G(y) = c' f eee eur de. 
0 


co 
Using the notation P(n/2a,y) ff gan /2a)oe e ° ds for a form 


¥ 
of the incomplete gamma function, we obtain 


(n/2a)-1_-y 
Q(y) = oe eee | ea é 
1-G(y) P(n/2a,y) 


Thus Q is also the ordinary hazard rate of a untvartate gamma 
dtstrtbutton. 


b) The 0-Generaltzed Multtvariate Normal famtly (Goodman and 
Kotz, 1973) has density 


-(|[AGew) [1 ,)” 


h(x) =e é p) 


; n 
where ¢ > 0, A is non-similar n by n matrix, vER, 98> 0, 


and 
- 6\1/6 
II¥I l= (ir). 
i=l 
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; 9 + 
Accordingly, t(x) = -log(h(x)/c) = (| |AGs-2) | |) and 


Vv 


a vol{x|t(x) < y} 


1/6) 


vol{x| ||A -G-w ll, Sy 


* a rede eine 


det A T(14ne+) 
with the derivative 
dv 6-1 -l\n 
5, CT OFS)» Bea) 
dy det A =i y 
(14+n8 ~) 
d(y) = co”. 
dv 
i= = 
Consequently, gly) = ¢o(y) rae opie Bu i e ” where 
c' = WaNet rae Analogously to the previous case, 
(n/6)-1 hy) 
= Ses ee ee 
PAP Bae oe : = 
P(nO ~,y) 


Again, we note that the contour hazard rate is the same as the 
ordinary univariate hazard rate of a gamma distribution (but with 
a parameter different from that in (a)). 


c) The univariate standardized exponential density h(x) = ero 
dv 
x > 0, yields t(x) = x with a = 1. The contour hazard 


rate coincides in this case with the ordinary hazard rate, both 
equal to the constant 1. (See also Section 5.) 


4.2 Power-Type Multivariate Distributions on an Infinite Range. 
a) The Pareto type I multivariate family has density 


where x, > a, > 0; ¢,,b >. 0%. Aceordineiys 
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h(x) Z ike 
EG<2| += -a= ) —, 


i=1 74 
n 
and the volume is V = aiessa ° Get) ss Thus 
t,y Al n n! 
dv. ( an ie b 
erm = a ema and b(y) — ev (y+a) E 
Consequently, 
dv. 
BW) = 0) cee 


cls (yta) P+ (yny™ 1), yon. 

The constant c' is determined from the fact that g isa 
univariate density function. This leads to a computable integral 
for 1-G(y); and-we note that g is a translated form of the 
untvartate F-densitty. Hence Q will coincide with the ordinary 
hazard rate of this variable. In the particular case when the 
parameter a is of the "natural" form -nt+l and b = atn, we 
have 


Blais taka pn) Perey IS dy sah s, Sy > ome 


and (by repeated integration by parts) 


n-=1 [j] n=jeh 
1-G(y) = y pila 2) eS. we yen) Ee 


j20 (atn—1) (942) (yng atm 3-2 


where xd! = x(x-1)...(x-j+1); mio = 1. Thus, in this case, 
Q is the "ordinary" hazard rate of a univariate F-density with 
parameters 2n and 2a which has been shifted to the right by 
a value n. 


b) The multtvartate t dtstributton has density 


T -b 
hg) = c * ((x-p)  A(x-y) +a) 
Where c,b > 0, a ‘is real; .A. is a positive definite n by n 


matrix, and u €R". Accordingly, 
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3} 
-a 
c 


t(x) = 
€ Ak 
= (x-u) A(x-n), 
1 i Au nf? 
and the volume is ny = Vasehk ~ Teen y . - Hence 
ery tL 4 eae seen a 
dy 7 Jaeth 2T (1+n/2) y - 
naa b(y) = e(yta)y> 
dv. 
Thus, g(y) = oly) ° ake 
My 
= cle Gite war a AON ea Fhe 


The contour hazard rate is similar to that in a) but the par- 
ameters are different (due to the presence of the quadratic form 
rather than the linear one in the expression for the density). 


4,3 Mixed Type (Power-Exponenttal) Distributtons Defined on an 
Infintte Domain. 
a) The generalized untvartate ganna distribution has density 


h(x) 


Il 
Q 
* 
@ 


x = 0. 


where ¢,.8, Y > Os "Q°>"=1% “Accordingly, 


t(x) = -109(*@2) = Bat = Oe see 
. = =i 
Then: Nels vol{x|x, as et i 
+ vol{x|t ot 5 x See 
2 eee a 


= mare ~ Oy 
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where x 


is the increasing branch of Se over [x9 +0) while ie 


0 satisfies the relation: log x, = a= ite and 


0 


a decreasing branch over [Xs to). It follows easily that 


while $(y) =c ee. Thus 


g(y) = 


Consequently, 


1-G(y) 


= ey 
. dt, “(y) : dt, ~(y) 
dy dy 
il 1 
dt (x) dt (x) 
dx “1 dx 
dv, sie ea 
os CA a F) 
PO lm tc leh gle 
se - ee ee 
-f g(s)as = ct f a staigg 46 
S=y S=y 
o dye he) at, *(s) 
ores “ i ds = ds a9 
S=y 
= ' as coer 1 oe 
=e f v aera 
ms du 
Et (s) 
idee 
ey -s J, 
if e Bacar) ds 
ear du 
-1 
u=t (s) 
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The two integrals on the preceding line are found to be _ 


-1 
a 2, 
ry. 2 oe 
ih Titen tacds and if phase dx, 
mt, (y) x=0 
- -t a-l 
respectively. Denoting TI(a,x) wi ast dt with [(a,0) = 


['(a), we obtain after straightforward calculations 


- = -¥ 
Q(y) = g(y)/[1-G(y)] = eas (By - BS") /(C, + Cy - D) 


where A = ye BeOFL ly 


w 
i] 


Byit; (v1) = a/ty'(y), 1=1,2, 


(=) 
i} 


P(fotll/y, Be(ty (1), 4=1,2, 


D = I'([atl]/y). 


In the particular case of a gamma density given by h(x) = 
xe *, Paes 0, we have t(x) = x-log(x) and $(y) = e”. Let 
Bes eee i=1,2, be the inverse functions of y = t(x) on 
either side of x= 1. Thus, 


dV y/dy = x, / (x, -D - x,/(x,-1) 


and hence the contour hazard rate is explicitly given by the 
formula 


[x,/(x,-1) - x/(x5-1) Je” 


-x -x 
(x,-Le i +(x,-l)e : -1 


Qty) = 


In this case of a non-monotonic univariate distribution, it 
is still meaningful to talk about the life failure between a 
and b, O<a<b< ©, corresponding to the inspection of the 
random component when we don't have information about an earlier 
or a late failure, yet we know that no failure has occurred 
between the times a and b. The contour set collapses here 
into a pair of points since a typical contour set is 


{t,  Neney 
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5. CHARACTERIZING PROPERTIES OF CONSTANT AND INCREASING 
CONTOUR HAZARD RATES 


The following question is of interest. Under a logarithmic 
(or a power) transform for obtaining t, what is the family of 
densities h whose contour hazard rates coincide with the 
ordinary hazard rate of a given univariate density f? In 
particular, what is the family which corresponds to the "pivotal" 


density f(y) = exe y > 0, whose ordinary hazard rate is the 
constant 1? The problem is of practical interest since it yields 
a systematic procedure of generating families of multivariate 
distributions with desired properties. 


Consider the equation g(y) = f(y) = e? or, equivalently, 
dv, y 
d(y) ¢ ae =e ‘ 
dv -y 
UE e 
Th 2 = = (3) 
= dy $y) 


Once the t's satisfying the above equation are determined, the 
h's are immediately obtained as h(x) = $(t(x)). 


a) For the logarithmic transforms: 


he) = d(x) = ce, 


and t(x) = -logih(x)/c], $(y) = cre.”. 


Thus (3) becomes 


dv -y 


ire ey Sh (4) 
dy Beer a c 
and hence 
= +d. (CD. 
Veg} (l/c)y +.d 


b) Analogously for a power transform we have: 


H(z) =" o(t (a) ) Se ec 
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h(x) (a-1) 
and t(x) = = -a. 


Thus (3) becomes 


dv =y¥, 
SHE; ya_22 ~ (5) 
dy o(y) 
and hence, 
dv 
iyi h (a-1) s=y 5! 
ay Z (y+a) @ iv: (5%) 


If h is a one-dimenstonal density and t is differentiable, 
non-negative and vanishes at some finite point and has isolated 
maxima or minima, if any, (with vy ° -always finite and satis- 

> 


fying the conditions stipulated in Theorem 1), then it easy to 
see that dv, yi is the sum of "length" terms of the form 
> 


ee ; : : 
where = is an increasing inverse branch of t to the right 


a 
to the left; (at an end point these terms may possibly be 
replaced by a suitable positive constant). Equivalently, a 
typical term in the sum is 


of the local minima and He is the decreasing inverse branch 


nee 1 if 


(2 y (G2) , 
ds | | ds | 
Pret . wey 
s=t, (y) arts 


where the summation is over ali the minima. Clearly, in a 
neighborhood of an extremum these sums grow indefinitely while in 
equation (4) the constant 1/c is bounded for all y. This 
contradiction yields that, under the assumptions above, t must 


-1 
be monotonically increasing. This implies that Pages = 1/c 


2 
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and thus, t(y) = cy. (The boundary condition assures that 
d= 0 in (4').) Hence we have shown that the family of uni- 


variate densities given by h(x) = cre, Caos, see U0 | is 


the only family of one~dimensional densities with a constant 
contour hazard rate (under logarithmic transform). This is also 
a characterizing property of the ordinary hazard rate. 


In the case of two or more dimensions, an analogous reason- 
ing shows that t satisfying (4) cannot have isolated local 
minima (the rate of change of the volume under the curve at these 
points is unbounded). Therefore a class of functions t _ satis-— 
fying (4) in the case »f n-dimensional densities under a logar- 


i ‘ i at 
ithmic transform is given by t(x) = (y x)" where the positive 
vector y satisfies 


n 
ly| = 1 hae ay 
i=l 


but is otherwise arbitrary. Indeed we have 


vol{x! (y*x)” < y} 


<4 
WW 


t,y 


Tae 1/n a -1 
= wolighy xes-y>  hee= aT ae se 


Tao 
Thus h(x) = eet = is a class of n-dimensional densities 
having a constant contour hazard rate (under the logarithmic 
transform for t). Observe that for n> 2 the multivariate 

ie 

exponential density hx) = cee ~ % for any y > 0 does not 
generally possess a constant contour hazard rate. (Compare with 
example (6b) in Section 4.1.) This is in some variance with the 
conclusion reached for the case of a vector-valued multivariate 
hazard rate as defined by Johnson and Kotz (1975b). Analogously, 
one can determine the structore of one-dimensional or multi- 
dimensional densities h with an ICHR (increasing contour hazard 
rate) and DCHR (decreasing contour hazard rate). We shall 
briefly discuss the ICHR case under the logarithmic transform. 


We have h(x) = ae t@ and, as before, 
zy 
g(y) = cre ey: 


dy 
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dv ; 
(y) nial - 
Dain 2 CY JEL aie = Fo OY 2 see 
Hence Q(y) 1-C(y) o av, - 
ae ental emia 


S=y 
a°v, dv. 
A sufficient condition for this is that Seg ei re The 
dy 


latter is satisfied by any real valued differentiable function 


+ 
Fo eOn mak such that 


dv 
r(y) 4s > 0 and ee = r(y)e’. (6) 


However the class of t's satisfying (6) is, by the above argu- 
ment, represented as 


t(x) = w((e'x)"), where c =[ ° 33) 


od s 
lc| = 1/n! and wy) = r(s)se* ds. 
s=0 


(w is a monotonically increasing function.) Consequently, a 
family of densities h with an ICHR is given by 


nti sresegeangn a 
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We now state briefly the corresponding results for the case 
of densities under power-type transforms. Since the function in 

_ the r.h.s. of (5) is uniformly bounded, the above reasoning con- 

cerning the branches of the "inverse" of t is valid in this 

_ case as well and we have as above that, under mild restrictions 

on t, the only one-dimensional t's satisfying (5') are mono- 

tonically increasing given by 


a 
Gr. 
Oe 
4 
i 
Q | 
on 
+ 
~ 
4 
Q 
| ed 
0) 
\q 
|v 


~—a. 


a 
— *[P(a,a) - T(aty,a)] = 


- is a monotone ‘increasing function, t(y) = ees) and h(x) = 


-1 -(a- 
e{[W ~(x)+a] ( as x > 0, is the only one-dimensional family 
of densities having a constant contour hazard rate under the 
power-type transformation. 


Thus a) = 


I 
= 
vom 
<q 
VY 


say. Since w(*) 


An analogous argument for the n-dimensional case yields 


h(x) = ee (esa) (where the vector ¢>Q_ satis- 
n 

fies |c| = Ic, =1/n! but is otherwise arbitrary) as a family 
i=1 

of densities with a constant contour hazard rate under the power- 

type transform. More generally, let the relation between h and 

t be given by h(x) = $(g(x)), where $ is a suitably chosen 


i 


te 
positive function (from Re to JIR«) 7a Setting 


dv 
B(y) =e =e , 
y 
we obtain 
y =i =s 
Vv = [¢(s)] ~ e ~ ds = xy), say, 
CY s=0 


; =1 Tan 
an increasing function of y. Choosing t(x) = x ((¢ x) ), 
where the vector c is as above, we have 


h(x) = 60> Ce x)”) 


as a family of n-dimensional densities with a constant contour 
hazard rate under the transform @¢. Finally, note that in the 
case of n> 2 this family can be extended by choosing in place 
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satisfying 


For example, k(x) 


leone 


as above, Vin m(x) 


appropriate choice. 


6. SUMMARY 


While the "classical" hazard rate is concerned with the 
directly, the contour 


behavior of a r.v. over the points of Rt 
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of the form (ctx) a more general function of Es k(x), say, . 
: : | 
for a suitable m: R ; 
if 
; 
i 


= Rr’, where 


= vol{uJu€R™ and m(u) < m(x)}, will be an 


definition is determined through sets, of potnts (i.e., the con- 


tours). 


All one-dimensional concepts for hazard rates such as IHR, 
DHR, IHRA, and DHRA carry over immediately to multivariate den- 
sities by means of a corresponding univariate density g 
and comparisons of contour hazard rates - such as for IHR or DHR 
- can be accomplished for multivariate densities with a common 


contour set. 


Characterizing properties discussed in Section 5 


indicate that the contour and ordinary hazard rates are compatible 
as measures of IHR and DHR properties and thus justify the 
choices of the "volume reducing" transformations suggested in 


this paper. 


The proposed concepts may thus serve as a tool in studying 
various "hidden" properties and dependence interrelations of 
multivariate distributions and should be useful for generation 
of new families possessing desirable modelling features. 
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FAILURE TIME DISTRIBUTIONS: ESTIMATES AND 
ASYMPTOTIC RESULTS 


JANOS. GALAMBOS 
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SUMMARY. The paper deals with life distributions for coherent 
systems of components. Two major questions are discussed: (i) 
estimation of system life in terms of component lives, and (ii) 
asymptotic models. Both questions are related to extremes of 

a sequence of random variables through the path set and cut set 
decomposition of coherent systems, which reduce a coherent system 
to either a parallel or a series system. It is pointed out that 
the classical theory of extremes of independent and identically 
distributed random variables does not provide an acceptable 
approximation. Hence, the emphasis is on dependence or on the 
case when the random variables are not identically distributed. 
The inequalities presented when discussing question (i) are appli- 
cable not only to extreme value problems but to an arbitrary 
multivariate distribution when lower dimensional marginals are 
specified. The asymptotic models are also discussed in the light 
of hazard rate properties of the limiting distributions for the 
extremes in a particular model. 


KEY WORDS. coherent system, component life, system life, extreme 
value distribution, dependent model, inequalities, multivariate 
distribution, Weibull distribution, hazard rate. 


1. LIFE DISTRIBUTION OF COHERENT SYSTEMS: THE 
REDUCTION TO EXTREME VALUE DISTRIBUTIONS 


In this introductory section we collect some definitions which 
are needed to present two general representations of coherent 
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systems, the so called minimal path set and minimal cut set 
representations. These two representations lead to the fact 

that the life distribution (failure time distribution) of every 
coherent system is an extreme value distribution in some dependent 
model of random variables. The dependence is the major emphasis 
here; the classical extreme value model with independent and iden- 
tically distributed random variables is very rarely applicable 

in reliability theory. 


Let a system consist of n components. We denote by x; 


and T the random life length of the jth component and the system, 
respectively. Evidently, the events {X, > x} aad 41>) x 


express the fact that the jth component and the system function 
at time x. The following assumptions are made throughout this 
paper: (i) the event {T > x} is uniquely determined by the 

events {X, > x}, 1 <j< n, or by their complements; (ii) if { 


eet OF ey aera ews ners 


{T > x} in the case {X, < x} for some j, then T remains 


larger than x if the jth component is repli.ced by a functioning 
similar component; and (iii) {fT > x} does depend on {x, > x} 


for each j. A system with these three properties is called a . 
coherent system with each of its components essential. See 
Chapters 1 and 2 of Barlow and Proschan (1975) for the basic 
properties of coherent systems. 


There are two special systems which will play specific roles 
in the sequel and-which we define below. 


Sertes system. If a system fails as soon as one of its components 
fails, we call it a series system. Hence, 


T= min(X,,X,,°**,X). 


Parallel system. We call a system parallel, if it functions as 
long as one of its components functions. It follows that 


T= max(X,,X,,°*+,X_). 


We shall now show that an arbitrary coherent system can be 
reduced to a parallel or a series system by a suitable grouping 
of the components. For proving this fact, we introduce the 
following concepts. 


Minimal path sets. A path set of a coherent system is a set C 

of components such that if each member of C functions then the 
system functions. A path set is minimal, if the removal of a 
single element from it results in its failure to remain a path set. 
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Minimal cut sets. A set C of components is called a cut set if 
the system fails whenever each member of C fails. A cut set 

is minimal if no element can be removed from it without violating 
its cut set property. 


Now a coherent system which is capable of functioning is 
necessarily both a path set and a cut set. Therefore, every such 
coherent system has at least one minimal path set and one minimal 
cut set. Let Ay> ol ie and C)> CFA. Seas be the distinct 


minimal path and cut sets, respectively, for a given coherent 
system. Then, by definition 


T= max {ai xX, = min Maxon Ae ae 
1<t <p (jes, J ict<m (jec, J 
Putting U, = min {x,}, Vv. = max thal, (1) 
jeA, jec, 
we-obtaing |To=fr:max sdfiUx}c=!s minw afvi}: 


l<t<p * actem 


In other words, the system life of an arbitrary coherent system 
is an extreme value of a suitably chosen sequence of random varia- 
bles. Its distribution function, which we call a failure time 
distribution (for coherent systems), is therefore an extreme value 
distribution in some appropriate model. It should be emphasized 
that the independence and stationarity cannot be assumed here 
even if the original components are believed to function indepen- 
dently (such a case is rare though; see the last but one para- 
graph of the next section in this regard). Because the minimal 
path sets A. (or the minimal cut sets Cc.) are not disjoint, 


the random variables Uy (or Vv) of (1) are strongly dependent. 


Their dependence is determined by the structure of the underlying 
system and by the dependence of the original components. Hence, 
their dependence is never a matter of arbitrary assumptions. 

Thus the study of these distributions is an integral part of the 
theory of the extremes for dependent models. While such a theory 
is well developed in Chapter 3 of the present author's book, 
Galambos (1978), we shall discuss some points of this theory as 
they relate to failure time distributions. Although in the pre- 
vious paragraph we defined failure time distributions in terms of 
coherent systems, much of what is to be said also applies to 
failure times of dams, when failure is caused by high floods; of 
air quality, when failure is defined by the fact that some spe- 
cific pollutant exceeds a given level of concentration, and to 
other diverse fields. We shall, however, remain in our discussion 
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at coherent systems, and we use the notations and concepts of the 
present section throughout the paper. 


2. ASYMPTOTIC EXTREME VALUE DISTRIBUTIONS AS 
FAILURE TIME DISTRIBUTIONS 


We have seen that T can always be expressed as the maximum 
or the minimum of some random variables. Since in these represen- 
tations either p or m is large for a system with a large 
number n of components (recall that we consider essential 
components only), we assume for the general discussion that 


T= max {u,} 
Ihe txt p 


with p large. Now if the distributions and the interdependence 
of the us are known then the distribution of T is uniquely 


determined and can be computed by routine methods. Hence, only 
that case presents a problem when either the distribution func- 
tions FY (x) = PU. <x), 1< t <p, or the dependence relation 


of the ue are unknown. In such cases, an approximation becomes 


necessary. It is shown on p. 90 of Galambos (1978) (and further 
explored in Galambos, 1981) that a reliable approximation to the 
distribution of T cannot be obtained through an approximation 
to F(x). Rather, one should develop a dependent model for the 


UL» evaluate the possible limiting distributions-for the maximum 


in that model, and one of these possibilities is to be applied 
as an approximation to the distribution of T. 


There are a number of dependent models for which the mathe- 
matical results are at an advanced stage (although far from com- 
plete). These are described in Chapter 3 of the mentioned book 
of the present author, from which we quote the following results. 


(i) Approximation by the classical model (which would assume 
that the BY are independent and identically distributed) 


is very rarely justified, but when it is applicable, then 
the failure time distribution is Weibull. 


(ii) The assumption that the U. form a sequence of exchange- 


able random variables is mathematically justified for an 
arbitrary coherent system. But, because of this generality, 
the possible limit laws for the maximum form a very large 
family. The investigation of the properties of some sub- 
families of these distributions would be a very important 
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task. It should be remarked that, contrary to the claim 
in the paper Zidek et al. (1979), the sequence U, cannot, 


in general, be considered as a segment of an infinite 
sequence of exchangeable variables; only finite exchange- 
ability is justified. 


(iii) There is a general dependent model, in which the possible 
limit laws for the maximum coincide with the set of those 
distributions whose hazard rate function (defined below) is 
monotonic (see Sections 3.9 and 3.10 in the quoted book). 
Because of the significance of this result to engineers, 
we discuss this conclusion in more detail. 


Engineers have recognized for a long time the importance of 
the hazard rate function of a failure time distribution. Let us 
first give the definition of hazard rate. Let X > 0~- be a random 
variable with distribution function F(x) and with density 
function f(x) = F'(x). Then the hazard rate r(x) of X is 
defined by the limit relation 


r(x) = lim = P(X <x+ 6x|X Bax) 
6x=0 


An easy calculation yields from this limit relation 
r(x) = £(x)/[1-F(x)]. (2) 


While this latter form is a convenient formula for actually cal- 
culating r(x), the definition of r(x) is what makes it appli- 
cable, since it represents the rate at which X fails in a short 
time interval (x, x + 6x), given that X has survived beyond x. 


It is apparent to an engineer that a new system of components 
whose life is represented by X may have some positive probability 
of failing immediately after production, but this probability 
decreases as time passes (burn-in period). On the other hand, 
an old system of components is more and more likely to fail as 
time passes (aging or wear-out period). Between the burn-in and 
wear-out periods, it is accepted that, for most systems, only 
accidents may cause failure (accidental failure period). The 
stochastic definition of accidents is either by constant hazard 
rate or by the lack of memory property P(X 2 xty | OP ge #) 
= P(X 2 y). See Section 1.5 in Galambos and Kotz (1978) to see 
that these two seemingly different definitions are equivalent. 


Since each of the three periods above represents a monotonic 
failure rate, it is very pleasing to see that the mathematical 
theory through structure functions and extreme value theory 
justifies the intuitive argument of the engineers. 


es 
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Another’ important consequence of accepting the existence 
of an accidental failure period for a system is that it excludes 
the possibility for the components to function independently. 
Namely, if we write (2) in the form 


r(x) = - d{In(1 - F(x))}/dx. 
we then get by integration 


x 
F(x) = 1 - exp{-f r(t) dt}, x > 0. 
0 


We thus see that when the hazard rate is constant, then F(x) is 
exponential. In particular, during the accidental failure period, 
life distributions are always exponential. Assume now that all 
components as well as the system achieved the accidental failure 
period. This means, that both the components and the system have 
exponential failure distributions (for a certain period of time 
only). It then follows from a result of Esary et al. (1971) that 
the components, with the exception of series systems, are sto- 
chastically dependent. That is, one cannot construct a stngle 
structure other than a series system tn whitch the components 
would funetton tndependently (and which system would achieve an 
accidental failure period). This is a very important conclusion 
because several estimates on reliability are developed in the 
literature under the assumption that the components are indepen- 
dent. 


Finally, we remark that there are a number of characteriza- 
tion theorems for exponentiality (see the book Galambos and Kotz, 
1978) which can be used for testing whether a system is in its 
accidental failure period. In most cases, those limited charac- 
terization theorems are sufficient when one assumes a priori that 
the underlying distribution is of monotonic hazard rate. A 
typical result of this nature can be found in Ahsanullah (1977). 


3. ESTIMATES ON FAILURE TIME DISTRIBUTIONS 


We have emphasized in the previous section that components 
for most structures cannot function independently of each other. 
At the same time, we may know exactly the distribution of compon- 
ent lives, mainly through characterization theorems. This leads 
us to the problem of estimating failure time distributions by the 
distributions of component lives under some assumption of depen- 
dence of the components. 


We use the path set and cut set decompositions, in view of 
which structure life is an extreme of "component lives" (where 
"component" is either a minimal path set or a minimal cut set). 
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Through this approach, the mathematical problem is the estimation 
of the distribution function H (x) of 


T= max(U,, Ua? sant ee 


under some form of dependence of the UL and under the assumption 


that the distribution functions FCs) = P(UL < x) are known. 


There is one concept of dependence, the so-called association 
of random variables, for which there is an extensive literature 
with reliability emphasis (see Barlow and Proschan, 1975, and 
Natvig, 1980). However, these works deal with estimating E(T) 
in terms of E(U.), 1l<¢t <p, rather than giving estimates 


on Hx). 


Since we deal with distributions, we express dependence 
through distributional assumptions. The simplest distributional 
assumption is, of course, when only bivariate distributions are 
involved. For simplicity, we introduce the notations 
A, = A,(x) = {U, 2 x} 

J J J 
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and we let i = meeO) stand for the number of those a which 
occur. Then H&&) = FM, = 0), Sik 0 = ae) and 


2 ; 
+ =E ‘ his latter meaning of S anideins 
25 0 ce ae This latte g ep 2p 


makes them appealing to the applied statistician, while their 
original definition is the useful form in mathematical arguments. 
It is slightly more convenient to state results for 

1 - H (x) = shee SE) anthanefor HO) itself, and we shall do so 


below. .Let us consider estimates of the form 


where a, b, c and d are constants (which, in principle, may 
depend on x _ suppressed in all notations). 


The best lower bound in (3) is known (Kwerel, 1975, and 
Galambos, 1977), according to which a and b should be of the 
form a= 2/(k+1) and b = -2/k(k+1), where 1<¢ k<p is an 
integer. It is then easy to find the optimal k . which equals 


=e 
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[2s,, /S, ] +1, where [y] signifies the integer part of y. 
»P »P 


(Notice that k= 1 yields the classical estimate by the method 
of inclusion and exclusion.) For the upper bound in (3), only 
partial results are available. The best known result fis) Ve = 01. 
and d= -2/p (Kounias, 1968, and Galambos, 1975). 


Before proceeding with the discussion of the estimates in (3), 


notice the following important fact. The results quoted in the 
previous paragraph are such that the coefficients a, b, c and d 
do not depend on x. Hence, they remain valid if we redefine ve i 


N Tf Oh eset oa then 
ON j j 2 


{m, =O; = {U, < x); J, < Xoo" 
and thus (3) provides estimates on the p-variate distribution of 
the U; in terms of univariate and bivariate marginals. These 
ideaudtivies should be taken into account when one is interested 
in constructing multivariate distributions with given (univariate 
and bivariate) marginals. 


rtoU, < x}, 


Let us return to (3). Since Ei. 21) is related to the 
distribution of the maximum of the U,, one would like to get 
suitable extensions of (3) to Fin 21r), r 21, which is rele- 


vant for the distribution of the (p-r+l)st order statistic of 
the U,- Two methods of proof of (3) lead to interesting results 


in this direction. One proof, which roughly says that (3) is 
valid for arbitrary (dependent) sequence of U, if it is valid 


j 


when the U, are i.i.d., implies that a set of coefficients in 
(3) determines a set of coefficients for estimating P(m 2 r) 


in the form of (3) (Galambos and Mucci, 1980). The other proof, 
introduced in Galambos (1977), provides a technique which can be 
used with success in more general cases than (3) (for example 

to ate 2 xr) for any r 21, and with bounds not necessarily 


linear). On this line of extensions of (3), we mention Sathe 
et al. (1980). For earlier results on inequalities of the nature 
of (3), see the survey at the end of Chapter 1 of Galambos (1978). 
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A NOTE ON SHOCK MODEL JUSTIFICATION FOR IFR 
DISTRIBUTIONS 


PURUSHOTTAM LAUD and ROY SAUNDERS 


Department of Mathematical Sciences 
Northern Illinois University 
DeKalb, Illinois 60115 USA 


SUMMARY. We consider the model in which the failure rate for a 
device changes when the device is subjected to shocks which occur 
stochastically over time. We show that increasing failure rate 
distributions can be obtained by making simple models for the 
effects of shocks. The results provide a physical motivation for 
using the Weibull distributions for failure time data. Random 
failure rates used in Bayesian inference are also obtained in 

a similar manner by modeling the effects of shocks to be 
stochastic. 


KEY WORDS. Failure rate, Poisson process, shock model, stochastic 
failure rate, Weibull distribution, weak convergence. 


1. INTRODUCTION 


The problem of choosing a particular family of distributions 
for analyzing failure time data is difficult since in addition to 
requiring that some member of the family "fit" the data it is also 
desirable to have some physical justification for using the chosen 
family of distributions. 


As evidenced in the historical section of Barlow and Proschan 
(1965) early work on justifications was primarily concerned with 
specific families of distributions such as the exponential and 
Weibull families. The typical approach in the early work consisted 
of deriving specific families by considering failure times of 
complex pieces of equipment containing numerous components and 
making various reasonable assumptions about the joint failure 
distributions of the components. 
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More recently Esary et al. (1973) have presented justifica- 
tions for the use of more general families of distributions such 
as the increasing failure rate distributions. These justifica- 
tions are based on a shock model which assumes that items are 
subject to shocks occurring as a Poisson process over time and 
that an item fails after receiving some random number of shocks. 
The justifications consist of showing that when the distributions 
of the number of shocks an item survives are of various forms 
then the failure rate must also have a certain general form such 
as being increasing. Generalizations of these justifications of 
Esary et al. to include different processes generating the shocks 
and random effects of shocks have been considered also. 


In this note we use the basic idea from the shock model and 
apply it to failure rates in order to develop a novel method of 
justifying the use of some distributions widely used to analyze 
‘failure data. We also show how the method can be extended to 
provide justifications for some random failure rates processes 
which have been used by Dykstra and Laud (1980) in Bayesian 
analysis of failure data. 


Specifically we assume that an item is subjected to shocks 
which, using X(t) to denote the number of shocks occurring in 
[0,t], occur as a Poisson process {X(t): t 2 0} with mean 
function E{X(t)} = At. Different from the Esary et al. model 
we assume that the shocks affect the failure rate of the item 
and that the failure rate of an item is a stochastic failure 
rate process {W(t): t 2 0} which is a function W(t) = 
h, ({X(t) : 0 <t< t}) of the shock process. By assuming two 


specific and intuitive forms for the functions h. C+) we show 


that widely used failure distributions can be generated as limiting 


cases of this model when shocks occur frequently and each shock 
has a small effect on the failure rate. The results relating to 
Bayesian analysis are obtained in a similar manner when the 
effects of the individual shocks are allowed to be random 
variables as opposed to the fixed functions hy (+). 


2. RESULTS FOR AN ADDITIVE MODEL AND A MULTIPLICATIVE MODEL 


The additive model assumes that the effect of successive 
shocks is to increase the failure rate for the item. That is, 
for some nonnegative real valued function h(*), the ith shock 
increases the failure rate by the amount h(i) so that W(t) = 
h(O) +°**+ h(X(t)). 


To consider the situation where numerous shocks occur in any 
time period and each shock has a small effect let {x (t) : O<t<o} 


en ee ee ee ee ee ieee ee ee ee ee ree 
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denote a sequence of Poisson processes having mean functions 
E{X (t)} = Aut and let {h: n=1,2,3,°°*} be a sequence of 


shock effect functions for the additive model. Note that the 
number of shocks will be large if a +o as n-+© and the 


effect of the shocks on the hazard rate will be individually 
small if h(i) > 0 for all i as n+. With proper choices 


for these rates Lae and functions Bet") 5. the stochastic 
failure rates W(t) = h (0) teeet h(x (t)) converge to the 


deterministic failure rates of widely used survival distributions. 
For example if we take AD = 6n, h (0) =0, and h(i) = B/n 


then, for each finite T > 0, the rates {W (t): OF <atrs a 


converge in probability to the deterministic failure rate function 
W(t) = 68t. Thus the corresponding distribution of failure 


times converges to a particular member of the Weibull family with 
parameters a= 2 and y = 68/2 having density function 


f(t) = ve f “350, Yorn ~4°> 0 
Oe otherwise 


and failure rate function r(t) = gee 


Other members of the Weibull family can be obtained similarly. 
to do this take ae = 6n with 6 = gis(oel). h (0) = 0 and 
h(i) = af(i/n)*? = ((i-1)/n)%"] for i>1. In this case we 


have 2° h (j) =.0(i/n)* and W(t) = a(x%(t)/n)% > which 
: n n n 
O<j<i 
converges in probability to the deterministic failure rate 


We(t)= vata. It is interesting to note that when a= 2 then 
h_(it1) - hi) = 0 which implies constant shock effect while, 

for, .dL=t 34 h (itl) - h(i) = 2h (1) which implies each shock 
affects the failure rate by two more units than the previous shock. 
In general h (i+1) - h(i) is equal to Piso + (424) 9+ = 
21°") (1) so that for a>2 the successive shocks have 
increasing effects while for 1< a< 2 the successive shocks 


have decreasing effects. 


A second way of modeling the shock effects is to assume that 
each shock increases the failure rate by a factor 1 + h(i) 
where h(i) is again a nonnegative real valued function and the 
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stochastic failure rate process is given by W(t) = h(0) x 

(1 + h(1)) x***x (1 + h(X(t))). If for this multiplicative 
model we consider sequences as above then we also obtain well- 
known deterministic failure rates in the limit. The Weibull 


: is obtained by using a = 6n, h (0) = 


a- 
failure rate yat 
1/a 


nets h(i) = (1 +a/i) for i31, and 6 = [yal (a+1) ] 
0 


since then Gh = 1) 


= (8X (t) + 1) (n6)"7 + /na(a) 


iL 


-1 a-1 
which converges in probability to Bet /T(atl) = yat 2 


As a second example note that to obtain the limiting failure 
rate r(t) = (a/y)exp(at) for t 20 which arises in the 
modified extreme value distribution (see, e.g., Barlow and 
Proschan, 1965) having density 


y exp (at - gta ha hy) 209 t 20, yoou0;t a> 0 
f(t) = 
OF 3: otherwise, 

choose da = O04 h (0) = a/y,. and hi @) = BJ/ns ftOn. 1 ae to 


where 68 = a. Where it is possible to rewirte multiplicative 
models as additive models in most cases, one of the models should 
have more intuitive appeal in a given physical situation. 


3. RESULTS FOR RANDOM SHOCK EFFECTS 


In this section we consider an additive model in which each 
shock increases the failure rate by a random amount. Let 
a(t), t 20, be a continuous increasing function of t with 
a(0) = 0 and lim a(t) =M<© as t+. For each n21 
Let HG), i241, be a sequence of independent gamma random 


variables with shape parameters a(i/n) - a((i-1)/n) and scale 
parameter 1 and let H 60) be degenerate at OQ. For each n 


let {xX (t): t 2 0} be an homogeneous Poisson process with mean 


function nt. As in the preceding section we model the shocks 
to occur as events in XO): The ith shock, however, increases 


the failure rate by a random amount H(i). We take the shock 


effects random variables {H (i): i 20} to be independent of 


sess cies rome Beret = ee ly cl bette or a inated en einanan tte ram eae mamaria 


SHOCK MODEL JUSTIFICATION FOR IFR DISTRIBUTIONS 323 


the shock process {xX (t): t 2 0}. We thus obtain the stochastic 
failure rate Wt) = H (0) ae H (k(t) Now, as n_ increases, 


the number of shocks in any interval is large and each shock 
effect is stochastically small. In the limit the random variable 
W(t) converges to a gamma random variable with scale parameter 


a(t) for each fixed t since the characteristic function of 
W(t) is 


, X(t) 
(8) = Elexp{ié £ HH 
j=1 
X(t) 

= E{E[exp(ié = H(5)) |X, Ce) 1} 

j=1 e 

X(t) 
= eT fapeasay}2 9 8 Sa) 
ot lt 


ul 
BE 1/(-1a9 a 


which converges to {1/(1-10) }""), A multivariate extension of 
this argument establishes the convergence of the finite dimen- 
sional distributions of the sequence {Ww (+)} to those of a 


gamma process with parameter function a(*). Using the definition 
and discussion of the space D[0,”] in Rao and Sethuraman (1975) 
and a well-known sufficient condition (see Billingsley, 1968, 
Theorem 15.6) for weak convergence, it is straightforward to 
conclude weak convergence of {Ww (t): t 2 0} to a gamma process 


with parameter function a(*). Reynolds and Savage (1971) 
employed gamma process and, more generally, an independent- 
increment stochastic process to describe random failure rates 
which are introduced by Gaver (1963) to explain failures of 
components subjected to random environments. Dykstra and Laud 
(1980) use the extended gamma process in Bayesian estimation of 
an increasing failure rate. The above random shock effects model 
can be used to obtain these random failure rates. Following 
Reynolds and Savage (1971), consider an independent increment 
stochastic process {Y(t): t > 0} with no normal component, no 
deterministic component, and no fixed points of discontinuity. 
For such a process the logarithm of the characteristic function 
of Y(t) has the Lévy representation (Notation: E[exp{iuy(t)}] = 


exp{y, (u) }) 
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co 


p, Cu) = jn {exp(iuz) - 1} d N(t,2) 


where (i) N is a nondecreasing function of z and t with 
nonnegative second differences and continuous in t; Cit 
N(t,~) = 0 for t 2 0; and 


co 
2 ! 
(444) fe, has a N(t,z) <@ for t > 0. 
a ae 


We also require that the mean function of the process is bounded. 
Thus (iii)' is replaced by 


co 


(iii) | zd N(t,z) < Mi<@> for all. St )230. 


ot 


To obtain this random failure rate {Y(t): t 2 0}, suppose 
that the random shock effect HD has a distribution with 


characteristic function 


co 


E[exp{iu HD} = exp] fern (iuz)-1) a incl + ni) et 


Again letting Wt) = H (0) fecoy H (X(t) where H (0) is 


degenerate at zero, it can be shown, by using techniques similar 
to those used above to obtain the gamma random failure rate, that 
the sequence {W(t): t 20} converges weakly to {yY(t): t 2 O}. 


This provides a physical model justifying the random failure rates 
used by the above-mentioned authors. 


4. CONCLUSION 


The purpose of this article is to point out that when it is 
possible to envision shocks affecting the failure rate of an iten, 
assumptions about the way the shocks affect the failure rate can 
lead to specific well-known models of the survival distributions 
or failure rates. We believe these results also can be used for 
approximating and suggesting priors for Bayesian nonparametric 
estimation of failure rate functions. 
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ON THE MEAN RESIDUAL LIFE FUNCTION IN SURVIVAL 
STUDIES 


RAMESH C. GUPTA 


Department of Mathematics 
University of Maine at Orono 
Orono, Maine 04469 USA 


SUMMARY. In reliability studies, the expected additional life 

time given that a component has survived until time t is called 
the mean residual life function (MRLF). This MRLF determines the 
distribution function uniquely. In this paper an interpretation 
of MRLF in renewal theory is presented and some characterizations 
of the exponential distribution are obtained. Finally, considering 
the general MRLF, a method is developed for obtaining the mixing 
distribution when the original distribution is exponential. Some 
examples are discussed, in one of which Morrison's (1978) result 

is obtained as a special case. 


KEY WORDS. mean residual life function, failure rate, survival 
analysis, characterizations, renewal process, mixture. 


1. INTRODUCTION 


In life testing situations, the expected additional life 
time given that a component has survived until time t is called 
the mean residual life function (MRLF). More specifically, if xX 
is the life of a component, then E(X - t|X >t), Ls called the 
MRLF. This MRLF has been employed in life length studies by 
various authors e.g. Hollander and Proschan (1975), Bryson and 
Siddiqui (1969) and Muth (1977). Limiting properties of the MRLF 
have been studied by Meilijson (1972) and Balkema and DeHann (1974). 
It has also been shown by the author (1975) that the MRLF 
determines the distribution function uniquely. 
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In reliability studies, the survival function is often 
characterized by its failure rate (hazard rate). An essential 
difference between the failure rate and the MRLF is that the 
former accounts only for the immediate future in assessing event 
component failure, whereas the latter accounts for the complete 
future. Thus in a sense MRLF should play as important a role as 
the failure rate. 


Watson and Wells (1961) have found general conditions on a 
life distribution so that the mean remaining life of articles, 
operated for some fixed test period, is greater than the original 
mean life and, in this context, they have examined such well- 
known life distributions as the Weibull, gamma, lognormal and 
extreme value. Weiss and Dishan (1971) extend the work of 
Watson and Wells by assigning cost functions and by considering 
the economic aspects of burn-in and replacement. Muth (1977) 
focuses on the decreasing mean residual life functions and the 
properties of the associated probability distributions. He also 
defines the properties of negative memory, no memory, positive 
memory and perfect memory as associated with the MRLF, and 
examines several well-known distributions in terms of their 
Memory properties. 


After presenting some preliminary results in Section 2, we 
compare MRLF and failure rate in Section 3. It is shown that if 
the life distribution has increasing failure rate (IFR), then 
MRLF is decreasing but not conversely. Some comparisons of total 
life and MRLF are also mentioned. In Section 4, an interpretation 
of MRLF in renewal theory is presented and some characterizations 
of the exponential distributions are obtained. More specifically, 
we show that (i) if the renewal distribution has IFR and the mean 
life is equal to the mean residual life, then the distribution is 
exponential and (ii) if the renewal distribution belongs to the 
one parameter exponential family and if the mean life is equal 
to the mean residual life, then the distribution is exponential. 


Finally, there exist many life testing situations which can 
be best described as mixtures of distributions. In Section 5 
we consider the general MRLF and develop a method of obtaining 
the mixing distribution when the original distribution is expon- 
ential. Some examples are discussed, in one of which Morrison's 
(1978) result "The gamma distribution is the unique mixing dis- 
tribution of the exponential that leads to a linearly increasing 
MRLF" is obtained as a special case. 
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2. PRELIMINARY RESULTS 


Let X be a nonnegative random variable denoting the life 
of a component having distribution function F(x) and probability 
density function (pdf) £(x). Then the failure rate of X is 
defined as A(t) = £(t)/F(t), where F(t) = 1- F(t) is called 
the survival function. If r(t) = E(X - t|x > t) denotes the 
MRLF, then 


co 18 
r(t) = f F(x)dx/F(t) = [u - f F(x)dx]/F(t), (1) 
t (0) 


Differentiating equation (1), one obtains 
NG 5 ee 5 Re ake NEA Co (2) 

which expresses X(t) in terms of r(t). 
It is well known that X(t) determines the distribution 


function uniquely and hence r(t) also determines F(x) and 
we have 


t 
F(t) = zea exp{—f tae) (3) 
0 


(see Muth, 1977, and Laurent, 1974). This result has also been 
obtained by Gupta (1979) using a completely different approach. 


Thus F(t), A(t) and r(t) are all equivalent in the sense 
that, given one of them, the other two can be determined. Hence 
in the analysis of survival data, one sometimes estimates 
A(t) or r(t) instead of F(t) according to the convenience 
of the procedures available. 


3. COMPARISON OF MRLF AND THE FAILURE RATE 


We say that X has increasing failure rate (IFR) if A(t) 
is nondecreasing and likewise we define decreasing failure rate 
(DFR) distributions. The following result shows that IFR implies 
decreasing MRLF, but the converse is not true. A dual result 
holds for DFR distributions. 


Theorem 1. If F is IFR then the MRLF is decreasing, but not 
coversely. In particular, r(t) < r(0) for IFR distributions. 


Proof. The hypothesis implies that F(t+s)/F(t) is decreasing in 
t for each s 20. [Take logarithmic derivative with respect 


330 ~R.C. GUPTA’ 


to t.] The result follows upon integrating this ratio from 
s=0 to s =. Bryson and Siddiqui (1969) have given an 
example showing that the converse is not true. Another counter- 
example to the converse is provided by taking 


v (t= 1/aete)yeandaaddt), SL Hae kan ZEAE Oe 


4, INTERPRETATION OF MRLF IN RENEWAL THEORY AND 
SOME CHARACTERIZATIONS OF THE EXPONENTIAL DISTRIBUTION 


Suppose a component operating in a system is replaced upon 
failure by another component possessing the same life distribu- 
tion, so that the sequence of component life lengths forms a 
renewal process. At any time t, the component in operation 
is identified for study. Let ue be the age of the component 


in userdt.. it. andelet te be the remaining life of the compon- 
ent (time from t until failure). The quantities U. and u. 
are known as backward and forward recurrence times, respectively, 
in renewal theory. 

It may be observed that De and Vy are independent if 


and only if the renewal distribution is exponential. Attention 
has been drawn in the literature for characterizing the Poisson 
process and hence the exponential distribution by certain pro- 
perties of the distribuion of Vi (see Chung, 1972). Cinlar 


and Jagers (1973) and Holmes (1974) present a characterization by 
the mean value of ae More specifically, if E(V.) is finite 


and is independent of t, then the process is Poisson. 


For large values of t, assuming the life distribution is 
nonlattice, the limiting pdf of Me or ae is given by 


fi~) =FO@/, (4) 


where F(x) = 1 - F(x), u = E(X) < @ and F(0) = 0».(see Cox, 

1962). Let Y be a random variable having pdf (4) representing 

the residual time of faultless operation of the component. Some 

applications of the distribution of Y in life length studies 
are described in Blumenthal (1967) and Scheaffer (1972). 


Now A, (t), the failure rate of Y, is related to the MRLF 
Ofek Dy 


Xi Ct) = RCO Se ay = a eer 
t 


EP DIOL ALLA LG ANE ALE OAL 
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Thus, Theorem 1 is equivalent to the statement "Y is IFR 
whenever X is IFR." We note that 


F(t) = F(t) (r(t)/r(0)]. 


Assume that X is IFR so that r(t)/r(0) <1. Taking expecta- 
tion of the preceding equation gives E(Y) < E(X), with equality 
if and only if r(t)=r(0). This gives us the following charac- 
terization of the exponential distribution. 


Theorem 2. Suppose the renewal distribution is IFR (or DFR). 
Then E(Y) = E(X) if and only if X has an exponential distri- 
bution. 


The following theorem shows that if we drop the condition 
of IFR, then the equality of mean values of X and Y_ charac- 
terizes the exponential distribution in the one-parameter expon- 
ential family. 


Theorem 3. Suppose the renewal distribution belongs to the one 
parameter exponential family with pdf 


PUR vidoe (0) nixiee 


cee EY )s— "FE (X)> for all 6 “in? some intenvalad!, sathenisx. Shas 
an exponential distribution for all 9@ in lI. 


Proof. Using Laplace transform technique, it can be seen that 
the gamma distribution is characterized within the linear expon- 
ential family by the property that its coefficient of variation 
is independent of 9. (Mafoud, 1977, p. 24; Ratnaparkhi, 1981). 
Further, having a coefficient of variation equal to unity is 
characteristic of the exponential distribution within the gamma 
family. Thus it is sufficient to show that E(X) = E(Y) implies 


Var (X)} = [E(x)]°. But this is evident since, in general, 


p(y) = f y £,(y) dy = fy Fy) dy/u = E(R)/2u. 


5. MRLF OF A MIXTURE 


Recently Morrison (1978) has shown that the gamma distri- 
bution is the unique mixing distribution of the exponentials that 
leads to a linearly increasing MRLF. In the following we have 
considered the general MRLF and have developed a method of 
obtaining the mixing distribution when the original distribution 
is exponential. Some examples are discussed, in one of which 
Morrison's result is obtained as a special case. 
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Let X be a nonnegative random variable, denoting the life 
of a component, having distribution function F(t;8) and let 
F(t) be the distribution function of X after mixing on @. 
Then 


Eg (F(t; 8) ] = F(t). Cay 


Suppose now the original distribution is exponential, i.e., 
F(t;9) = exp(-t9). We obtain from (5) that 


G(t) = F(t), (6) 


where G(t) is the Laplace transform of 9. Inparticular, the 
mixing distribution is uniquely determined by the unconditional 
survival function of X, i.e. a mixture of exponentials is 
identifiable. 


Equation (6) has several applications. Firstly, using 
Bernstein's characterization (see Feller, 1971, p. 439) of 
Laplace transforms, we may obtain necessary and sufficient con- 
ditions for a distribution to be a mixture of exponentials. 
Secondly, expressing F in terms of r or iA, the Laplace 
transform of @ can be written in terms of either of these 
functions. Some applications of the first type are summarized 
in the following theorem. 


Theorem 4. Let X be a nonnegative random variable with failure 
rate A(t) and MRLF r(t). Then (a) X is a mixture of expon- 
entials if and only if its survival function is completely mono- 
tone. (b) A(t) is completely monotone if and only if X isa 
mixture of exponentials with infinitely divisible mixing distri- 
bution. (c) If r(t) has a completely monotone derivative, 
then X is a mixture of exponentials with infinitely divisible 
mixing distribution. 


Proof. Part (a) follows from Bernstein's theorem. For (b) use 
Theorem 1 ofFeller (1971, ps 450). Finally, df 2 (6)@ has 2 
completely monotone derivative then 1/r(t) is completely monotone 
(Feller, 1971, Criterion 2, p. 441) as is (1+R'(ty)< Since the 
product of completely monotone functions is completely monotone, it 
follows that A(t) is completely monotone (see equation 2). 


Since AT = 1/r, we see by the argument just given that 


the (limiting) recurrence time distributions are also mixtures 
of exponentials with infinitely divisible mixing distribution 
if r(t) has a completely monotone derivative. 


We conclude with some examples in which equation (6) is 
applied to determine the mixing distribution. 


wit ek getline ete ae Ere pao eae eet eee 
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Example 1. Let r(t) = a+ bt. Employing equation (3) and (6) 
one obtains 


G(t) = [a/(atpt))*, 


where k= 1+ 1/b, which is the Laplace transform of a gamma 
distribution. Thus linear MRLF leads to a gamma distribution 
for 9. Note that this was the main result obtained by Morrison 
(1978). Also his expression for G'(t)/G(t) needs a minus sign. 


a-1 
Example 2... Let A(t) = at with O<a< 1. Proceeding as in 


Qa 
Example 1 we get G(t) = exp(-t ), which is the Laplace trans- 
form of a stable distribution. 


The author wishes to thank the referees for some useful 
comments. 
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IDENTIFIABILITY PROBLEMS IN THE THEORY OF 
COMPETING AND COMPLEMENTARY RISKS — A SURVEY 


ASIT P. BASU 


Department of Statistics 
University o= Missouri 
Columbia, Missouri 65211 USA 


SUMMARY. In this expository paper the concepts of competing and 
complementary risks are defined and a survey of recent results 
in the area is presented. Identifiability of distributions, 
both univariate and multivariate, useful in reliability and 
survival analysis is considered. 


KEY WORDS. competing risks, complementary risks, identifiability, 
reliability, distributions of minimum and maximum, series and 
parallel systems. 


1. INTRODUCTION 


The problem of identifiability (or of nonidentifiability) 
arises naturally in a number of physical situations. The problem, 
in general terms, can be defined as follows. Let U _ be an 
observable random variable whose distribution function belongs to 
a family F = {Fg 6e€2} of distribution functions indexed by a 


parameter 96. Here @ could be scalar or vector valued. 
We shall say 98 is nonidentifiable by U if there are 
distinct parameter values, 9® and 96', such that Fy (u) = Fy (u) 


for all u. In the contrary case we shall say 90 is identi- 
fiable. It may happen that 6 itself is nonidentifiable but a 
function y(8) is identifiable in the following sense: For any 
Bsaben- Sols Fy (u) = F(u) for all u implies y(@) = y(n); in 


this case we may say @ is partially identifiable. 
335 
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Puri (1970) has surveyed some examples arising in the gen- 
eral literature. The purpose of the present paper is to present 
a survey of results available in the area of competing risks 
and complementary risks. We primarily consider the cases where 
we have an underlying parametric model. In Sections 2 and 3 we 
define the problems of competing and complementary risks and 
discuss the case when the underlying random variables are inde- 
pendently distributed. The case of dependent random variables 
is considered in Section 4. The problem of estimation is briefly 
discussed in Section 5. Finally, in Section 6, some open problems 
and other related areas of research are pointed out. 


2. INDEPENDENT RANDOM VARIABLES WITH IDENTIFIED EXTREMUM 

The problem of competing risks, in its simplest form, may 
be described as follows: Let Xx, be a random variable with 
distribution function F, (x), (i = 1,2,°°*,p). We assume that 
the X,'s are not observable, but that U = sd cas ln se 
We would like to estimate the F's given the observations on 
U. 

This model finds interesting applications in a number of 
fields but particularly in problems of survival analysis and 
reliability theory. Thus in problems of competing risks, or in 
studying reliability of complex systems, an individual or a 


system of components may be exposed to p different causes of 
death (failure) where X, is the time to death from the ith 


cause. Although one would like to know about the distribution 
of the X,'s, only observations on U's will be available. 


Basu and Ghosh (1980) give some of these examples. For other 
examples and a survey of the area see Birnbaum (1979) and David 
and Moeschberger (1978). 

When the X's are not identically distributed, the problem 


of identifiability is illustrated by the following example. 


Example 1. Let X, (i = 1,2,3,4) be independent random variables 
and suppose X, is exponentially distributed with E(X,) = B/ ds 
7 
Lf Ay + Ao = he + hie then min(X, ,X,) and min(X,,X)) are 
identically distributed. 
It is possible to introduce additional random variables so 


that the enhanced family with the added information becomes inden- 
tifiable. In this case we call the original family of 


EE —— 
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distributions recttfiable. Let I be an integer-valued random 
variable (I=1,2,+--,p). (U,I) is called an tdentified minimum 
if ihe wawhenm UW i= min(X) >X52%" +X) = x In the absence of 


I, U will be called a non-tdentified mintmum. 


Basu and Ghosh (1980) consider a dual problem, called the 
problem of complementary risks, where instead of observing the 
minimum (identified or non-identified) one observes the maximum 
V= bape erry ee I Basu and Ghosh describe examples showing 


that this problem also occurs naturally in survival analysis and 
reliability theory. 


We now‘consider the case when the X,'s are independently 


but not identically distributed. The joint probability distri- 
bution of (U,I) is specified by the monotonic functions 
Hy Gx) =P(U < x, I =k), k = 1,2,°°*,p. 


Then Berman (1963) has obtained the following theorem. 


Theorem 1. The set of functions {H, (x) } is related to the set 


{F, (x)} by the functional equations 
x 
H, (x) = f IL. [1-F,(t) ]dF, (t), = ok, 2,° <3. p. 
k ; j k 
=—0O j#k 


This solution of this set of equations is 


= { ti 1 y H, (t)] 1aH, (t)} 
FL (x) = 1 - exp{- [1 - cL j t k A 


—oo j=l 


where k = 1,2,°°-,p. 

Since the minimum U and the maximum V_ satisfy the rela- 
tion max(X,,X,,*-*,X,) = ~min(-X,, Bist) sk) 3 it follows that 
similar results also hold for the maximum. 

The next natural questions are: (a) What if the minimum 


(maximum) is not identified? Can we obtain the distributions of 
the xX. from that of U? And (b) What can we say about the 


identifiability of the F's if the X's are not independently 


distributed? We will discuss these next. 
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3. INDEPENDENT RANDOM VARIABLES WITH NONIDENTIFIED EXTREMUM 


In case the extremum U is not identified, one can still 
uniquely determine F(x) under certain conditions. To this end 


Basu and Ghosh (1980) obtain the following theorem. 


Theorem 2. Let F be a family of pdf on R, with support (a,b) 


il 
which are continuous and are positive to the left of some point 
A and such that if f and g are any two distinct members of 
F then lim{f(x)/g(x)} as x +a exists and equals either 0 
or © Let chgmiee be independent random variables with 
respective pdf's f,>f, “sf, ind; Et and let YY, “oes be 
independent random variables with respective pdf's belonging to 
here OB i mig Ul cite afin and min(Yy20%75¥,) have identical dis- 


tributions, then p =q and there exists a permutation 
(kj skost tsk) of (1,2,:°+,p) such that the pdf of ia 
ag (i = dBA PCMC CW Ms 
i 
Anderson and Ghurye (1977) proved a similar theorem for 
the maximum. As an application of the above theorem, consider 
the following examples. 


Example 2. F is the ramily of normal distributions 


b(x|u,o) = exp Hx-u)?/20°1/¥ (2007). 


Now, 
LL = We aod 10D sto 
di 
ih ElMarts ar : : 
0, BETO. <0. 5 OL GO. . = Og ganda a> 
Be (x U4» D z 1 a 2 2 1 
© i > = 
» WE 4 o> or o Oo, and Uy < Uy- 


Conditions of the above theorem are met. Hence the distribu- 
tions are identifiable. 


Example 8. F is the family of exponential distributions 


£) (x) = many eettopie a 


f, (x) (1, ifs ry = hy 


Here lim ——~ = 


este if Ay # ww) 
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The conditions of the above theorem are not met. By Example 1, 
the distributions are not identifiable. (Note, however, if the 
maximum is observed both normal and exponential distributions 
are identifiable.) 


There may, however, be situations when the conditions of 
the above theorem are not met and yet the underlying family of 
distributions is identifiable. For example, Basu and Ghosh 
(1980) have proved the following theorems. 


Theorem 3. Let X, have a gamma distribution with index O., 


and scale parameter Bi» (i = 1,2,3,4). Assume that a and 


a 
a, are not both equal to one and a, and a, are not both 
equal to one. Let xX) and X,, be independent and let X,, 
and X, be independent. If the distribution of min(X, ,X,) is 


identical with that of min(X,,X)), then either 


(a, 5) = (a, 50) and (B, »B.) = (8,>8,) 


or (0, 0.) = (a, 04) and (8, 8.) = (8,84). 


Theorem 4. Let xX, SS W(p,»8,), (i = 1,2,3,4) be independent 
Weibull random variables. If the distribution of min(X, ,X,) 


is the same as that of min(X,,X,), then either 


(p, »8)) a (p,58,) and (p58) = (p,>9,)5 


or (p,>9,) = (p,8)) and (p,»8,) = (p,59,) 


provided Py 2 Po: 
In Theorem 4 we exclude the case P, = Po: For if 


P, = Py = Ps say; the problem, after using the transformation 


Y. = xP (i = 1,2), reduces to that of the exponential distri- 
i i 
bution. 


~~. 


4, DEPENDENT RANDOM VARIABLES 


We next consider the case when (X)>Xporr+ 9X)» are depen- 


dent. We restrict our discussion primarily to the case p = 2. 
There are many physical situations where it is desirable to test 
the assumption that the X,'s are independent. It is therefore 


natural to study the extent to which (U,I) or U determines 
whether the joint distribution of X's can be identified. To 
this end Basu and Ghosh (1978) pointed out some difficulties, 
using the following construction. Let F(x, 5X.) = 

P(X, > x1 X,, > x5)» and F(x, »X5) = OF (x, »X,)/9x,, (i = 15:2). 
For simplicity assume that the density of (X, »X,) is every- 


where positive. Let 


x 
C(x) = exp -[°-F, (2,2) [F(Z,Z)] Vaz 
and assume J - F, (2,2) (F(z,z)) +az diverges for. i= 1,2. 


Then G, (x) = Te is a distribution function and (U,I) 
has the same distribution whether (X, >X,) is distributed 


according to F(x or according to G(x, »x,) = G, (x)) +6, &,)- 


1°*9) 
Thus the problem could have a satisfactory solution only if F 
is known to belong to a well-specified parametric family of dis- 
tributions. Similar results for nonidentifiability, in the 
absence of specific parametric models, have also been considered 
by Miller (1977), Tsiatis (1975), and Rose (1973). Tsiatis 
(1978) further illustrated the magnitude of this problem with 
some actual data. 


No general result on identifiability for dependent random 
variables is currently available. However Peterson (1975) has 
obtained some interesting inequalities when p = 2. 


Bivartate Normal Distribution. In case of specific parametric 
models, questions of identifiability have been settled for a 
number of distributions. We summarize some of these results. 
Let (X, »X,) re BYN (Uy 9H 994 95205) Nadas (1971) showed that, 


Lae ©< Pro <1, the distribution of the identified minimum of 


a normal pair determines the distribution of the pair. Nadas' 
proof is not complete, however. Basu and Ghosh (1978) completed 


Nadas' proof and extended it to the case of nonidentified minimum. 


Recently Gilliland and Hannan (1980) gave an elegant solution of 
the problem. 


. 
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No result for the general case is available as yet. However 
Basu and Ghosh (1978) established the identifiability of the tri- 
variate normal distribution given the distribution of the iden- 
tified minimum and given that, for each pair of random variables 


ochag with correlation coefficient Psy and deviations oy and 


fk < Se gid 
1 P4494; 0; Agel SB; 2 $33, mia 


Most of the identification problems considered so far are 
also valid when the maximum V, instead of the minimum U, is 
observable. For if X.is p variate normal, so is. -X and 


ey = -min(-X eee and thus identification 


qe 
problems for the maximum can be restated in terms of corres- 
ponding problems for the minimum. Also, any bivariate distri- 
bution obtained through strict monotone transformation of normal 
variables will be identifiable. The bivariate lognormal distri- 
bution is thus identifiable. 


Next we consider identifiability of several bivariate 
distributions useful in reliability theory and survival analysis. 
In particular, we consider bivariate exponential distributions. 
A survey of some of these distributions is presented in Basu and 
Block (1975). These include the bivariate exponential distribu- 
tions of Marshall and Olkin (1967), Block and Basu (1974), and 
Gumbel (1960). Basu and Ghosh (1978, 1980) have considered 
identifiability of these distributions. Their results are 
summarized below. 


(a) Marshall and Olkin Btvartate Exponential. The tail prob- 
ability of this distribution is given by 


F(x, 5X.) = exp[-A, x, - X,.x, -A max(x,,x,)], 


2X E Aye 
= bab iekes Kiel as acl 


Here all parameters are identifiable if (U,I) is observed. 
However, if only U is observed the parameters are not identi- 
fiable. 


(b) Block-Basu Model. Here the joint density function is given 
by 


120, AHA, 9) / AHA) I exp{-A)x, - (A,4A, 9) Xo} ae x) z x, 


and {XX (A, HA 9)/ A, tA5)} exp{-(A, +A, 5), - Xt if X, > x) 
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where A = aT “1 Ay + Ajo" Here the parameters are not identi- 


fiable at all. Note that the model proposed by Freund (1961) is 
also not identifiable, since it is related to the Block-Basu 
model. 


Because of the underlying physical assumptions neither 
Marshall-Olkin nor Block-Basu is considerec a suitable physical 
model when the maximum is observed. 


(ce) Gumbel Model I. Gumbel (1960) proposed two bivariate expon- 
ential distributions. The first is given by 


F(x, »x,) =l]- exp (-A, x, ) - exp(-A,x,) 


+ exp (-Ax, - AX - A,X ~ dy 9% Xo)» 


> > F - 
where X1>XosAj oho 0, Ajo > 0. Here, the parameters are iden 


tifiable if (U,I) is observed. However, i= the nonidentified 
minimum U is observed only Ajo and aS + rs are identifiable. 


If the nonidentified maximum V = max(X»,X,) is observable, 
then Aj2 is identifiable and O7Aq2 is identifiable up to 


a permutation. 


(d) Gumbel Model II. Here the distribution function is given 
by 


[1 - exp (-A)x,)][Q - exp (-A,,x,) ] [1 4 A, 9exP (-A, x) ~AQX) ]- 


If U is observable, Ayo is identifiable and (A, A5) is 
identifiable up to a permutation. 


(e) Bivariate Wetbull Distribution. The bivariate Weibull dis- 
tribution can be defined by its survival function 
P P 
= 1 Eye 
F = - i= = = 
(x) >X,) exp ALX AoXy Ao mex (x, »Xy }. 


If U is observable, Py and p are identifiable up to permu- 


2 


talon cal A, + 
ation so ds and 9 uo? OF. A.) (and Ay ao dio are 


2 
identifiable. 
5. ESTIMATION OF PARAMETERS 


Estimation of parameters based on (U,I), the identified 
minimum, has been considered extensively. for the bivariate 


se etl tt i le ee ee 
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normal distribution, Basu and Ghosh (1978) have considered the 
estimation based on U alone. Similar results can be obtained 
using y. 


For the general bivariate model the pdf of V, assuming 
independence, is given by 


f(t) = £ (t)F,(t) + £,(t)F) (t). 


The parameters can therefore be estimated numerically using the 
method of maximum likelihood. The method of moments may provide 
a simpler technique. 


6. CONCLUDING REMARKS 


In this section we point out a number of areas in which 
additional work is being carried out and we also point out a few 
open problems. 


So far we have assumed, in the competing risks problem, that 
the observed random variable U is the minimum of p 
(unobserved) random variables Xi atgan tee Set In analyzing mor- 
tality data the above is interpreted as follows. Consider a 
population in which p causes of death, Ray Agua are 


operating. Each individual in this population is exposed to the 
risk of dying from any one of these causes. One can recognize two 
kinds of distributions associated with death due to cause C,: 


(a) the survival distribution Fi Ltt) due to cause C,> con- 
ditionally that C, is the cause of death, tn the presence of 
other causes; and (b) the survival distribution ¥ Xt) due to 
Cis Le C, is acting alone. 

It is tacitly assumed that corresponding forces of mortality 


(failure rates) are the same. Gail (1975), Elandt-Johnson (1976), 
and others have explored the implication of this assumption. 


A second assumption is that the potential survival times 
X,'s are independently distributed with continuous distribution 


function F., (i = 1,2,°-+,p). Some results to this end are 
2 


given in Sections 2, 3, and 4. For another interesting direction 
of research see Miller (1977), Desu and Narula (1977), Langberg, 
Proschan, and Quinzi (1977, 1978), and the references therein. 
Desu and Narula consider the problem of estimating the distri- 
bution function F(t) = P(X, < t) amd provide a sufficient 
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condition on the distribution of (Kyore +X) under which such 


an estimation is possible. 


A third direction of research has been the interpretation 
of competing risks problems in terms of some stochastic process. 
Chiang (1968) has studied the problem of competing risks using 
time-nonhomogeneous Markov processes. Clifford (1977) and Berlin, 
Brodsky, and Clifford (1977) have considered the problem of iden- 
tifiability for this situation. 


The problems described in the previous sections can be 
extended in several directions. The author is currently working 
on some of the problems stated below. 


(a) As mentioned before, Theorem 2 of Section 3 does not 
provide a necessary and sufficient condition. It would be desir- 
able to improve on this result. 


(b) Most of the identifications results obtained so far 
apply to the case of two competing causes. These need to be 
generalized to the case of any number of variables. Some results 
to this end have been obtained by Basu and Ghosh (1980b). Algor- 
ithms for estimating the parameters should also be obtained. 


(c) The concept of competing risks is well known. Basu 
and Ghosh (1980) coined the term complementary risks for the dual 
problem. In reliability theory the corresponding problems are 
for series and parallel systems. It is natural to pose the 
following general problem corresponding to a k-out-of-p system 
(k < p). Recall a system is called k-out-of-p if the system 
operates so long as k or more components function. Let X 


(r) 


be the rth order statistic among Xa oats ‘ed Suppose only 
es is observable, where r= p-k+41. We assume the X's 


are independent. 


Given the distribution of some 5 6 can we uniquely 
determine the distribution of each X., Ci 12252 etre 


r = 1, we obtain the case of competing risks, and if r= p, the 
case of complementary risks. For identically distributed X's, 
specification of the distribution of any order statistic is known 
to completely determine the common distribution of the X's (cf. 
Galambos, 1975, p. 79). Therefore we turn to the case in which 
the X's are not equal in distribution, and we consider the rth 
tdenttfted order statistic ery I) where we) = X, when 


I =k. By Berman's theorem (cf. Section 2), the distribution of 
the X's are determined by the distribution of the identified 


—_———_ ss 
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minimum (or maximum). For exponentially distributed variates we 
have the following general result 


Theorem 5. Let Xp oX jor XK, be independent and exponentially 


distributed with E(X,) = 1/h,. Then, for any r, the joint 


distribution of the rth identified ofder statistic uniquely 
determines the values of Ahan bal tse 


Proof. Let X, have density f. and survival function Ef 


and consider the joint density function of Xan? a Ag, Cvs 


d 
ace Ry Sts 2 = *), k = 1,2,°*",p. (1) 


For each k, this is assumed to be a known function of t. For 
notational simplicity, take p= 3 and r= 2. With k=1, 
say, the joint density (1) becomes 


£,(t)(Fj(t) F(t) + F,(t)F,(t)] 


-rA,t -Ar,t -r,t -(A,+A,)t 
mAseh My bane + eracnapede vowel Lehes, beb2) 
Since distinct exponential functions are linearly independent, 


~2r, is uniquely determined as the coefficient of the most 
rapidly decreasing exponential in (2). Similarly for A, and 
A: The proof for general p and r_ is analogous: AL is a 
known multiple of the most rapidly decreasing exponential in the 
kth section of (1). 


The preceding argument yields a slightly more general result: 


specification of the kth section of (1) determines re uniquely 


and determines the other A's up to a permutation. Further 
results along these lines have been obtained by Basu and Ghosh 
(1980b). 
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DEPENDENCE CONCEPTS FOR STOCHASTIC PROCESSES 


DENNIS S. FRIDAY 


Statistical Engineering Laboratory 
National Bureau of Standards 
Boulder, Colorado 80303 USA 


SUMMARY. Dependence concepts are a relatively recent development 
in multivariate distribution theory. They allow dependence to be 
incorporated into a problem without requiring specific model 
assumptions, and define classes of multivariate distributions with 
useful properties. It is not apparent that related notions have 
also evolved in the theory and applications of stochastic pro- 
cesses. Multivariate concepts and stochastic process concepts, 
however, have had no influence on each other. In this paper an 
overview is presented of the work in stochastic processes that is 
analogous to multivariate dependence concepts. 


KEY WORDS. Dependence concepts, fractional Brownian motion, 
mixing, power-law spectra, self-similarity, stochastic processes. 


1. INTRODUCTION 


Stochastic independence may be thought of as a singularity 
in the continuum of all possible stochastic relationships among 
a set of random variables. It is an assumption that pervades 
both theoretical and applied statistical work. The reasons for 
this perhaps excessive use of independence may be attributed to 
two complementary factors: (i) The mathematical tractability 
of statistical problems under the assumption of independence, and 
consequently, the availability of the numerous known results. 
(ii) The comparative void that confronts a statistician who 
attempts to introduce dependence into a problem without making 
very restrictive assumptions about its form. 
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While new results are continually being published that make 
the first factor even more attractive, much recent work on depen- 
dence concepts for multivariate distributions is generally 
directed toward neutralizing the second factor. Considerable 
progress has been made in a comparatively short time span, much 
of it oriented toward reliability applications. Dependence con- 
cepts that have been déveloped include: Association, Positive 
Quadrant Dependence, Positive and Negative Dependence, and many 
others. See Ahmed et al., (1978), Barlow and Proschan (1975), 
Block et aZ., (1980), Esary and Proschan (1972), Kimeldorf and 
Sampson (1978), and Lehmann (1966). 


This work on dependence concepts has not been linked with 
general problems in stochastic processes even though finite dimen- 
sional distributions are central in the study of the latter. 
Special cases have been considered [Ross (1979), Esary et al., 
(1967)]. It is also not apparent that related notions have 
evolved within the field of stochastic processes. We show that 
such work does exist and that general concepts of dependence are 
important for both theory and applications. The objective of 
this paper is to present a firm, heuristic understanding of the 
more important concepts without all of the mathematical details. 
The theorems and their proofs and many perturbations thereof exist 
in the literature. The overview we hope to present does not. 


2. GENERAL PROPERTIES OF RANDOM PROCESSES 


A few relevant definitions and preliminaries are in 
order. Univariate, real valued stochastic processes will 
be of primary interest here. Let (, F, u) be a probability 
space and let T be the set of indices of the process. Then 
the finite, real valued function X(t, w) is a stochastic process 
if it is a measurable function of we for each t eT. We 
will suppress the argument w and write X(t). In most cases, 
T will be either the real line or the (doubly infinite) integers. 
For any fixed t ¢€T, X(t) is a random variable with distribu— 
tion function F(x;t) = u({X(t) < x}). Similarly a finite set 
{t,s+++,t hs t, € T, defines a random vector X(t, )++*,X(t_) 


with the multivariate distribution function 


F(xy 506+ 5X) ; tyorttst) = uC{x(t, ) < Xp>7°+,X(t_) < x). 


Any distribution of this type, derived from the X(t) process, 

is called a finite dimensional distribution. The family of all 

of its finite dimensional distributions uniquely defines a stochas- 
tic process [Kolmogorov (1940)]. 
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Assume that the finite dimensional distributions have finite 
first and second order moments. The mean function of a stochastic 
process X(t) is given by m(t) = E[X(t)], te T. It isa 
function of t, and when it is convenient the zero mean process 
X(t) -m(t) will be used instead of X(t). The covariance function 
of X(t) (zero mean) is given by C(s,t) = E[X(s)X(t)]3; s,t eT. 
It has the following properties: Symmetry: C(s,t) = C(t,s); 


Non-negattvity: C(t,t) = E[x(t)7] > 0; Schwartz Inequality: 
a) 2 Z A pee 
|c(s,t)|~ < E|X(s)|“E|x(t)|“; and Nonnegative Definiteness: 
n 
.9t,)4,0,2 acon Se a 

ie C(t, 5% 0 for all {t, ae Ee 
i,j=l 

and all Q,,°:',a. 

a n 

A stochastic process X(t) is strictly stationary if for 


any eae g t. € T, and any s_ such that t, ap ub 


the random vectors X(t ),°7*,X(t_) and X(t, + s),°7", X(t) + s) 


have the same finite dimensional distributions. Any finite dimen- 
sional distribution of a strictly stationary process is invariant 
under a translation in time. A weaker definition is often used. 

A stochastic process X(t) is stationary if E[X(t)] =m and 
E[X(s) X(t)] = C({t-s). A stationary process has a constant 

mean value and the covariance between the random variables 
defined at any two time points is a function only of the distance 
between these points. An important aspect of stochastic processes 
is their spectral representations. Let X(t) ; t € (-~, ~) be 

a real valued stationary stochastic process with covariance func- 
tion C(t). Then C(t) has the representation: 


c(t) = f easy; - <1t< ©, 


where S(~) -— S(-~) = C(O) < ~™ and S(A) is real, nondecreasing 
and bounded. The process X(t) is said to have spectral distri- 
bution function S(A). S(A) denotes the relative magnitude of 
harmonic components of frequency A present in X(t). The spec- 
trum represents the distribution of the total variance C(O) of 
the process X(t) over frequency. The region where A| is 
small is called the low frequency region and large |r corres- 
ponds to the high frequency region. Note that the set in which 

X varies depends on whether X is continuous or discrete. In 
the equispaced, distance A, discrete time case C(t) has 
spectral representation for T = 0, + A, +2A, se 


Tt /N og ee & 
c(t) = f e'“a{ } S(A + 2nj/A) - S((2mj-m)/A)}, 
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where the domain of the spectral density is now i € {[-7/A, +n/A]. 


The covariance and mean functions are clearly related to the 
serial dependence in a stochastic process. Deterministic depen- 
dencies, such as m(t), are simple in nature and not of concern 
in this study. The covariance function, however, possesses non- 
trivial information related to dependence in a stochastic pro- 
cess. Given a stochastic process with covariance function C(t), 
the most direct link between the process and multivariate depen- 


dence concepts is to choose a set of time points tye et, and 


apply the dependence concepts to X(t, ),+++,X(t_). We will di- 


gress, however, and consider certain aspects of dependence that 
have evolved solely within the theory and applications of stochas- 
tic processes. 


3. ASYMPTOTIC INDEPENDENCE CONDITIONS 


One class of dependence considerations originating in the 
theory of stochastic processes is asymptotic in nature and has no 
counterpart in multivariate distribution theory. Historically, 
ergodic theory and central ‘limit theory for dependent sequences 
were the motivation for these concepts. Ergodicity is of funda- 
mental importance for statistical applications involving stochas-— 
tic processes. An ergodic process is one for which the time 
average and the ensemble average are equivalent. The practical 
implication is that the single "long-term" realization of an 
ergodic stochastic process may be used to infer properties of 
the family of all possible realizations of the process. That 
is: 


n 
lim 1/n ) xX(t;w) = E [X(t;w)]. 
n?© t=0 ” 


Ergodic theory had its origins in Statistical Mechanics [Hopf 
(1937), Kac (1959)] and applications to probability followed 
[Rosenblatt (1956), Blum and Rosenblatt (1956), Rosenblatt (1961)]. 


Let "7*,X a> Xo» Hae be a strictly stationary stochas- 


tic process with E(X, ) < ® and let L. denote an event in 


Ehe smallest O-field generated by events on {:--, Xp x} and 
let R, denote an event in the corresponding o-field on 

{Xo PS lab - Then a necessary and sufficient condition for 
ergodicity is that for all L. and Ro» 

EPS Stise S=L, OFF eer rcs 
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n 
tim 1/ PC. Res] Pile) e.CR is 
ie n 5a, ( a ss) ( >) ( i) 


While the details are covered in various ways by many authors, 
the basic principle is that any events invariant under shift 
transformations must differ from the sample space 2 or the 
null set by, at most, measure zero. This property is also known 
as metric transitivity. 


Ergodicity, however, is not a strong enough condition for 
certain limit theorems. One stronger condition is called the 
mixing condition. Let classes of events L and R. be as 


defined above. Then the process ry X Xo Xperts is mixing 


pie 


stil PCL R,) = P(L_)P(R.). 


Therefore, mixing means that the information between any two 
events decreases as they become separated in time (or space, etc.). 
More precisely, any two events become stochastically independent 
as the regions of the process, on which they are defined, move 
apart. It is also implicit in this definition that the events 
cannot be made large enough to offset this decreasing dependence. 


For certain applications the mixing condition is too weak. 
Zhurbenko (1975), for example, shows that it is not sufficient 
for certain time series results. A stronger definition of asymp- 
totic independence is the following. Let L. and RS be as 


defined previously and let the function g(x) >0 as x>®. 
Then the process {-°-, X > Xo» ea) is said to be g-mixing 


tietor tail ¥ i.e Rafe < 485 
is Ss 
|P(L, R,) - P(L,)P(R,)| < g(s-r). 


Clearly, all g-mixing processes are mixing processes that 
have additional constraints on the rate at which independence is 
achieved. A comment on terminology is in order, since authors 
are not consistent. Our g-mixing is occasionally called uniform 
mixing or strong mixing, while other authors use the term strong 
mixing to describe our mixing, and the term weak mixing to des- 
cribe the mixing condition which is sufficient for ergodicity. 
Others, in particular the Russian literature, use the term 
regularity. Many variations of these conditions can be found 
in the literature [Kolmogorov and Rozanov (1960)], 
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4. LONG-TERM DEPENDENCE 


This section deals with mathematical descriptions of depen- 
dence which were originally motivated by observations of real 
world phenomena. While the previous section dealt with short- 
term dependence, we now consider processes for which there 
is a strong dependence even for large time lags. 


Hydrology of the Nile river was the stimulus for much of 
the work that we will discuss. Data on water flow in the Nile 
reveals that the next year, and many subsequent years are very 
likely to be similar to the current year. Years of drought are 
followed by years of drought, and years of flood are followed by 
years of flood. H. E. Hurst, an English physicist, recognized 
the phenomena and developed statistical procedures to deal with a 
seemingly purely hydrological problem. B. B. Mandelbrot showed 
that such phenomena are far more general than hydrology and has 
contributed much to their mathematical description [Lawrance and 
Kottegoda (1977)]. Mandelbrot (1977) has developed convincing 
arguments that geographical coastlines, music, mountainous 
skylines, noise in electronic devices, and turbulence have much 
in common stochastically with river flow. 


This discussion of dependence begins with a statistical 
procedure that provides a measure of the dependence of a stochas-— 


tic process [Mandelbrot and Taqqu (1979)]. 


If X(t) ; t 2 0 is a continuous time random process let 


1 Es \= 
X*(t) =f. 'K¢s)ds, x(t) mi Xoteds andaakeas (xx), 
0 0 


If X(t) ; t = 1,2,°°* is a discrete time process then let 


X* (0) 


: 2% ee 
0, x(t) = J. x(s)*,> and. e&X™ @)um a) ad ate): 
s=1 s=1 


Now define for d> 0: A(u) = X*(u) - u/d X*(d), 


R(d) = sup A(u) — inf A(u), and $?(a)=1/a x7" 4d) - 1/a2x*@(a) 
O<u<d O<u<d 


Finally, Q(d) = R(d)/S(d) ; d > 0. R(d) is the adjusted range, 
S(d) the sample standard deviation, and Q(d) is the rescaled 
adjusted range. Q(d) is also called the R/S statistic. The 
lag d is the right end point of the time interval [0, d] con- 
sidered. These definitions may be generalized to Q(t,d) by 
considering the interval, [t, ttd]. : 


Observe that Q(d) is invariant under linear transformations 
on X(t). That is, Q(d) has the same value for a sample of 
X(t) as for aX(t) + b. Furthermore, R(d) and S(d) are each 
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invariant if a=1, that is, under level shifts of X(t)?" The 
original derivation of Q(d) was heuristic and based on hydrolo- 
gical considerations. The objective was to determine the size of 
reservoir that, based upon some interval of past history of a 
river, would have contained all floods and for which the reser- 
voir would never have run dry during [0, d]. The reservoir must 
then contain an initial quantity of -inf A(u). It is assumed 
that the water is flowing out of the reservoir at a constant rate 
so that the amount withdrawn, during time [0, d], is equivalent 
to the total discharge X*(d) into the reservoir. This informa- 
tion is very important, for example, in designing a dam. Hurst 
added the denominator S(d) as a normalizing factor. Mandelbrot 
and Wallis (1969) discovered that this was a very fortuitous 
choice. Among other properties they showed that Q(d) is robust 
against deviations from normality in X(t) and against 


E[X-(t)] =o, infinite variance. 


The purpose of the Q(d) statistic is to identify whether 
or not the process from which a given set of data arose exhibits 
long-term dependence and, if it does, to characterize the level 
of dependence by a single number. The symbol J_ has been assigned 
to this number by Mandelbrot, who claims that the earliest refer- 
ence to such phenomena occurs in the biblical story of Joseph. In 
slightly more detail, if there exists a real number J, _ such that 


a 7Q(4) converges in distribution to non-degenerate limits as 
d+, then this J is called the asymptotic Hurst Exponent or 
the R/S exponent. It has been shown that J has range O< J<l. 
Heuristically, if the Q(d) corresponding to a stochastic 

process can be normalized for some J then the sample Q(d)'s 


will fluctuate about the line ae A graphical representation 

of this fact on a log-log scale results in a straight line with 
slope J. An empirical plot of log Q(d) versus log d computed 
at various starting points t is called a pox diagram and appears 
as a uniform band of points centered about a straight line with 
positive slope J. The value J = 1/2 corresponds to a sequence 
of independent random variables. The values J # 1/2 represent 
various manifestations of long-term dependence. Before we can 
discuss this dependence in any more detail, it will be necessary 
to introduce some general classes of processes. 


5. PROCESSES EXHIBITING LONG-TERM DEPENDENCE 


A class which provides a natural framework within which to 
study long-term dependence is that of self-similar processes. A 
stochastic process Z(t), -~ < t < © is self-similar with scaling 


H j 5 
exponent H if for all a>O Z(at) and a Z(t) have identical 
finite dimensional distributions. We will use the symbol = for 
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equality of finite dimensional distributions, as in 
Z(at) = Ny Xe 0% The scaling exponent H is related to the level 


of long-term dependence. Let X(i) = Z(itl) - Z(i) denote the 
increments process derived from Z(t). 


If Z(t) is a Brownian motion process, then the X(i) are 
normally distributed and independent. Brownian motion is the 
special case of self-similar processes when H = 1/2. Therefore, 
this Z(t) has H= 1/2 and its increments process X(i) has 
J = 1/2. However, the R/S exponent for Z(t) is J=1. 


A more general example of self-similarity is the class of 
stable processes developed by P. Levy with index 0<a < 2. 
If Z(t) is a stable process with a #2 then it has indepen- 
dent increments, infinite variance, and is self-similar with 
H = 1/a. The increments process X(i) has J = 1/2 whatever 
the value of H. For a= 2 the process reduces to Brownian 
motion. 


Fractional Brownian motion [Kolmogorov (1940), Levy (1953), 
Mandelbrot and Van Ness (1968)] is yet another example of a self- 
similar process. Recall that, for a particular integral type, 
fractional integrals are extensions to a continuum of the usual 
integer order. Ordinary Brownian motion is a certain integral 
(of order one) of white noise. Fractional Brownian motion is, 
therefore, a stochastic process that arises as a fractional integral 


of white noise. Let bo be a real number, let He (0,1), and 


let B(t) denote ordinary Brownian motion. Define fractional 
Brownian motion B(t), t!> O- asmfolitows: B,, (0) = bo 


H-1/2_ ,__\H-1/2 


0 
Bp Pall, rot oes, ge heey (-s) ]dB(s) 


H=1/2 


io 
+f (t-s) dB(s)]. 
0 


Define B,(t) for t <0 similarly. Kolmogorov (1940) and Levy 


(1953) are two early papers on such integrals. We can also view 
fractional Brownian motion as a realizable moving average, with 


H-1 
[2 on white noise. 


weighting function (t-s) 
For Z(t), a fractional Brownian motion with parameter H, 

the increments process X(i) is self-similar with self-similarity 

exponent H. If Z(t) is (ordinary) Brownian motion then 


Z(t+s)-Z(t) has mean zero and standard deviation gif2 (GM 


not/2 
Ss 


an law"). Almost all sample paths of fractional Brownian 


DEPENDENCE CONCEPTS FOR STOCHASTIC PROCESSES 357 


motion are continuous and are not differentiable in the mean 
Square sense for He (0,1). The standard deviation of the incre- 


ments (with Lag s) of Z(t) with parameter H is eh V(H) 


Ww 


(an "s” law'') where 


H-1/2 H-1/2,2 


7385 12 as+1/28)2/2, 


(0) 
via) = (T(H+1/2)}* Ef [Q-s) 


These increments are called fractional Gaussian noise. 


We will now summarize some properties of these processes. 
If an arbitrary self-similar process has stationary increments, 
and is continuous (in mean square) then O0<H< 1. If an arbi- 
trary Gaussian self-similar process is non-constant, has stationary 
increments and continuity (m.s.) then it is fractional Brownian 
motion. Consider the relationship of J to H for some special 
classes of processes. 


(Aad Tc= Hee! X(t) stationary, E[X?] < ~, x(t) ergodic 
X*(t) is self-similar H 
(J =H = 1/2: is the special case of X(t) isi.d.) 


C14): Se = 1 20-1 / 252 OSs desl) 25% (b)} sein tthe donatn of 
attraction of a self-similar H"' process 


x(t) in the domain of attraction of a self-similar 
H'' s4process:s 


2 , 
(119) sd eee l/ Zenoss 16 tock (tit. Lidssen[X,] = 9; %(t)orda tthe 
domain of-attraction of a stable ’.0 <a <:2 “process. 


(iv) J = 1: X(t) any non-stationary process that becomes 
stationary when differentiated (or differenced) one or 
more times. 


(v) J=H: Z(t) fractional Brownian Motion 
H<é: LO¢k] 


There are other examples of self-similar processes such as Hermite 
processes and Rosenblatt processes which we will not discuss. 


We have seen that J # 1/2 is a measure of long run depen- 
dence. What about the self-similarity exponent H. Assume 
Z(t) ; -~ < t < © is self-similar with exponent He (0,1); 


2 
that Z(0)-= 0, E[z(t)] = 0, ‘and E[Z (t)] < © 3 and that the 
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increments X(i) are stationary. Then the correlation function 
of the X(i) is given by 


2 
(hy Shaye Tee en 


Clearly, for H= 1/2 the increments are i.i.d. For H # 1/2 
we have two cases identified by Mandelbrot. 


(i) Persistent long run dependence: This is the case where 
1/2 <H< 1. Then r(0) + r(1) +-:* = © and the spectral 
density is infinite at the origin. 


(ii) Antipersistent long run dependence: This is the case where 
0<H< 1/2. Then -+++|x(-1)| + [r(0)| + |r(1)| +---< o>, 
and «+++ r(-1) + r(0) + r(4+1) +°*:= 0. The spectral density 
is zero at the origin. 


6. STRONG DEPENDENCE, WEAK DEPENDENCE, AND MIXING 


Previous sections have dealt with seemingly unrelated topics. 
These included asymptotic independence conditions, the rescaled | 
adjusted range statistic, self-similar processes, and the defini- 
tion of strong and weak dependence. In this section, it will be 
seen that these topics are more intimately related and have 
applications in theoretical physics. 


A significant area of activity in theoretical physics has 
been the development of algebraic, in particular, group theoretic 
representations of fundamental physical systems. An important 
algebraic structure occurring in this work is the renormalization 
group [Benettin et al., (1977)]. The motivation has been the 
modeling of critical phenomena, turbulence, and geophysical pheno- 
mena. The relationship to stochastic processes and dependence 
concepts arises when critical behavior and the renormalization 
group is interpreted in a probabilistic context. Cassandro and 
Jona-Lasinio (1978) develop this relationship. They define a 
process to be weakly dependent if it is g-mixing. Therefore, 
all of the "nice" processes that are ergodic, for which central 
limit theory holds, etc., are weakly dependent. Strong dependence 
is characterized by any process that violates g-mixing and corres- 
ponds to long-term dependence. Not only does long-term dependence 
and mixing have meaning in terms of physical phenomena but the 
dichotomy of persistent and antipersistent long run dependence 


has interpretations in terms of physical systems [Jona-Lasinio 
(1977)}. 


The physical interpretation of long run dependence leads to 
some interesting paradoxes in the interpretation of their spectral 
representations. In statistical language, the spectrum is a func- 
tion which represents the distribution of the total variance of a 
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stochastic process over frequency. The frequency being the 
Fourier frequency of the orthogonal sinusoidal components. In 
physics, the variance is interpreted as energy and intervals in 
the spectral domain correspond to energy in the corresponding 
passband. Integrals, over their domain, of spectra of weakly 
dependent processes are finite. For strongly dependent processes, 
however, the spectra are not so nice [Taqqu (1980)]. 


The spectral density for this class of processes has a power 
law form which we will write as 


s(x) =-|a|®, 


where A is the frequency and exponent a is related to the 
underlying process. The nature of this relationship is not cen- 
tral and will be left to the references. Drawing an analogy to 
visible light, large |\| is considered in the ultraviolet 
region and small |A| is considered in the infra-red region. 
When a> -1l the spectrum has heavy (non-integrable) tails and 
is said to exhibit the ultraviolet catastrophe. For -1<a <0 
the spectral density is infinite at the origin and exhibits the 
infra-red crisis. For a< -l the spectrum is non-integrable 
near the origin and this phenomena is called the infra-red catas- 
trophe. When a= -1l the process is called 1/f ‘noise and has 
many physical interpretations [Voss (1979), Mandelbrot (1967)]. 
It is worth noting that many real life processes have been 
observed to have a power law spectra over extremely large fre- 
quency bands. 


7. CONCLUSION 


This discussion is only an introduction to the topic and is 
far from complete. For example, other measures of long run 
dependence have been developed with very different motivations. 
The Allan Variance of Time-and-Frequency metrology is mathemati- 
cally different from the R/S statistic and was developed com- 
pletely independently. The end result, its interpretation, and 
its graphical representation, however, is virtually identical. 
The relationships of the Allan Variance to the R/S statistic 
and of long run dependence to multivariate dependence concepts 
have also been considered. 
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SOME DISTRIBUTION THEORY RELATED TO THE ANALYSIS 
OF SUBJECTIVE PERFORMANCE IN INFERENTIAL TASKS 
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SUMMARY. When inferences made by a subject differ from normative 
statistical inferences it is possible to define measures of this 
divergence which throw considerable light on the nature of the 
deficiencies in the subjective performance. These measures in- 
volve quantifying differences between distributions. For the 
analysis of a single subject performing a single task the paper 
generalizes previous work by replacing absolute measures by more 
meaningful relative measures and by extending the scope of the 
inferential tasks involved. For a single subject performing many 
tasks and for groups of different subjects statistical analysis 
requires a rich parametric class of distributions to describe 
patterns of variability of probabilistic data, and the use of 
logistic-normal distributions is advocated for this purpose. 
Simple illustrations are provided of the main analytical techniques. 


KEY WORDS. Degree of uncertainty, feature selection discrepancy, 
inference discrepancy, inferential tasks, information gain index, 
logistic normal distributions, normative models, performance 
analysis, probabilistic data, subjective inference. 


1. INTRODUCTION 


Many individuals in the course of their routine work have to 
make important intuitive or subjective inferences on the basis of 
similar information in repetitive but independent situations. For 
example, a clinician on the basis of his past experience and on 
the information provided by symptoms, signs and diagnostic tests 
has to judge the relative plausibilities of possible diseases 


for each patient. A personnel manager, on the basis of a curricu- 
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lum vitae and information and impressions gained at interview, 
has to assess the chances of success in the post advertized for 
each applicant. A process controller having some picture of the 
variability of a characteristic of a manufactured item relative 
to the quality of the material input, has to infer a likely range 
for the characteristic of a new item with known material input. 

A biochemist on the basis of the circles cleared in an infected 
medium by droplets of an antibiotic of known standard concentra- 
tions and the circles cleared by droplets of a blood specimen of 
unknown concentration makes inferences concerning this unknown 
concentration in an often subjective way from graphical consider- 
ations. 


For some of these inference tasks the person concerned may 
have recourse to statistical help or to a computer package but 
this is not invariably so. Moreover it is of intrinsic interest 
to study the ability and variability of individuals in various 
professions in making appropriate inferences subjectively. It 
is the purpose of this paper to describe the underlying concepts 
of distribution theory and useful methods of analyzing the 
performance of such subjects in inferential tasks related to their 
work. 


Apart from any psychological, inter-disciplinary or cross- 
cultural insights such performance analyses may provide we have 
found them an excellent vehicle for statistical education. The 
author has now some ten years' experience of presenting inferential 
tasks to students of statistics at various stages of study, in 
service courses to students of other disciplines, to school 
children from fourth to sixth forms, to civil servants, clinicians, 
physicists, historians and even to statistician colleagues. In 
his admittedly subjective judgement the immediacy of the challenge 
of such an inferential task, particularly when presented in a 
familiar context,stimulates in the subject a keen interest. 
Moreover subjects, so often finding their own performance far 
from adequate and at variance with their peers, appreciate that 
statistical methodology may have much to give to their particular 
discipline and look forward to learning the 'secrets' of proper 
inferential methods. 


2. INFERENTIAL TASKS, STATEMENTS AND TRIALS 


We first establish a terminology, notation and framework for 
the study of subjective performance analysis in inferential. tasks. 


Inferenttal task. In an inferential task (such as medical diag- 
nosis or antibiotic assay) a subject (clinician, biochemist) is 
presented with a case (patient, blood-sample) for which an tnfer- 
ential statement (diagnostic assessment, antibiotic assay) con- 
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cerning the true hypothesis or tndex (true disease type, true 
concentration of antibiotic) is required. The subject is aware 
that the case has associated with it a unique but unknown index 
belonging to a known index set T (set of feasible diseases 
assumed or defined to be mutually exclusive, range of possible 
antibiotic concentrations). To help him arrive at his inferen- 
tial statement for a particular case the subject has available 
information concerning the case in the form of data on a number of 
features (results of diagnostic tests, clearance circle diameters) 
which can thus be regarded as a feature vector in some defined 
feature vector space X. 


Inferential statement. An inferential statement assigns a weight 
w(t) to each index t € T. in such a way that w(t, )/w(ts) can 


be interpreted as the relative probability or plausibility of 


ty compared with t,- We can usually arrange for the weighting 


function to correspond to a probability or plausibility function 
p(t) over T, for example, by considering p(t) = w(t) /2 w(t) 


or w(t)/S, w(t)dt. 


Previous expertence and training sets. The subject may already 
have some training and experience in the kind of inferential task 
under study, but this is seldom quantifiable. Examples again are 
clinician and biochemist with skills in diagnosis and assay. But 
inferential tasks can be selected so that the relevant experience 
and traiming are under the control of the experimenter, and hence 
quantifiable. For example, where diagnostic tests unfamiliar to 
the clinician have been evolved he can be presented with informa- 
tion on the complete training set of cases whose diagnoses and test 
results are known. In the antibiotic assay a training set is an 
essential ingredient of the task; since clearance diameter is 
known to vary from batch to batch of infected medium, it is 
essential for the subject to know the concentrations and 
clearance diameters of a training set of cases, referred to in 
assay work as 'the standards’. 


We write D to denote the data of a training set of m 
cases for each of which the true index and feature vector are 
known. Then D takes the form 


D = {(t,>*,) pis a a 


where t; eg x, € Kosi Sad 8 * 8.01): 

Inferential trials. In an inferential trial a subject is presented 
with a test set of n unrelated or independent cases and on the 
basis of their feature vectors Yget bay is asked to make in- 


ferential statements about their unknown indices si (eh We 
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shall concentrate mainly on trials where the inferential statements 
required of the subject are composite in nature, say density . 
functions Sp his 40k on T. The performance data thus consist o 


the set 
(Qs) tis dsteenh 


Having defined our terms we can now briefly describe previous 
work in the analysis of subjective performance in inferential tasks. 
Note that we confine the term inferential task to the provision 
by the subject of the equivalent of a probability distribution. 
over the complete set of possible indices or hypotheses. In 
particular, we are not concerned with subjective studies involving 
other forms of inference, such as assigning scores on a psychotic 
scale to profiles of psychiatric patients (Goldberg, 1970), or of 
decision-making, such as in artificial tasks (Davidson et al., 
1957) or in medical treatment decisions (Aitchison et al., 1973). 
Moreover our emphasis is on the analysis of actual performance and 
not on devices to encourage good probability estimation (Good, 
1952), though measures of performance and penalties for bad 
estimation are, of course, related. 


Early studies of performance in inferential tasks as defined 
here are by Edwards and Phillips (1964) and Phillips and Edwards 
(1966), who report conservative use of information in simple 
inferences concerning the composition of bags containing colored 
poker chips. For an extreme case of such conservatism among 
American lawyers, see Raiffa (1968, p. 20). Taylor et al. (1971) 
move from the artificial context of chips in bags to real problems 
of medical diagnosis and try to face clinicians with realistic 
tasks of inferring diagnoses from sequentially acquired informa- 
tion on patients. Taking a statistical diagnostic system as a 
norm they then analyse performance in terms of various measures 
of discrepancy from normative assessments, one of these measures 
being a generalization of the Edwards-Phillips simple version of 
conservatism. Technical definitions and discussion of these 
measures and their use in other inferential tasks are to be found 
in Aitchison (1974), Aitchison and Kay (1973, 1975) and Kay (1976). 


3. MEASURES OF NORMATIVE COMPARISON FOR A SINGLE 
SUBJECT PERFORMING A SINGLE TASK 


Normative model and system. When it is possible to formulate a 
rational model according to which a subject with the given infor- 
mation ought to be making his inferences then we can compare the 
subject's actual conclusions with the corresponding normative 
inference. In situations where such a normative model can be 
specified then standard procedures can be applied to use the data 

D of the training set to obtain a fitted model or normative system 
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which may be applied to the cases of a test set to produce in- 
ferential statements about these cases. The technical statis- 
tical details of the construction of a normative system need not 
concern us until we consider specific areas of study. A norma- 
tive system can thus be, expressed in the form of a conditional 
density function 


p(uly,D) 
over T, for a test case with feature vector jy. 


When the normative: system is applied to the feature vectors 
SL ppt ad Bee of the test set it produces normative statements, 
say 1),°**>r. 
Nature of compartsons. Comparison of how a subject's performance 
departs from normative performance thus requires the construc- 
tion of measures of the extent of the departure of statements 
S; from corresponding normative statements tr; een NR Fo 


In the tasks so far defined we have considered the feature vector 
information being supplied in one piece and there being a single 
final inferential statement. Later we shall consider tasks where 
the feature vector information is supplied in sequence to the 
subject and he is asked to make an inference statement after each 
step in the sequence. In such circumstances when considering a 
typical step we shall have information about the relative proba- 
bilities assigned to the unknown index before as well as after 
the feature information given to him. To deal with this at the 
present stage of our discussion we therefore suppose that for each 
case i the subject makes an inferential statement qd, prior to 


receiving the feature vector v4 for the case (i = 1,°°*,n). 


For a single test case or inferential task therefore we have 
to suppose that there is a prior inferential statement q(t) on 
T and that on the basis of knowledge of the feature vector y 
for the case we have to compare the subsequent subjective in- 
ferential statement s(t) on T with the corresponding norma- 
tive inferential statement r(t) on T. 


Later, in Sections 6 and 7, we shall be concerned with 
variability in s(t) from task to task within the same subject 
and between subjects performing the same task. For the moment we 
confine attention to meaningful summary characteristics or 
measures of performance, along the lines of those introduced by 
Taylor et al. (1971). Their characteristics are, however, 
measured on an absolute scale, not related to what is directly 
achievable within the inferential task, and so often difficult 
to interpret. A first purpose of this paper is to replace them 
where necessary by measures defined on a more meaningful relative 


scale. 
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Degree of uncertainty. Associated with any composite inferential 
statement, say q(t) on T, there is the now familiar informa- 
tion theory concept (Shannon, 1948; Khinchin, 1957) of a degree 

of uncertainty U{q(t)} or U(t) remaining in the identification 


of the true index: 
U(q) = -2,a(t) log q(t) or -J,q(t) log a(t) dt. (i) 
where q(t) log q(t) = 0 if q(t) = 0. 


Inference discrepancy. Since the subject records an inferential 
statement s(t) on T which, according to the normative system, 
should be r(t) on T we require, in order to assess his ability 
in inference, an overall measure of the difference between s(t) 
and the target r(t). This is provided by an information theory 
measure I(r,s) due to Kullback and Leibler (1951): 


r(t) tt) 
a(t) or T(t) 162 ee dts 4 (2) 


I(r,s) = Le TEAC): - days 
with the property that 


Lieys). oe0es (sat ot) 


Information gain tndex. We can quantify such notions as ‘under- 
using the information available', "reading too much into the 
data', 'going contrary to the evidence’ in terms of an information 
gain index G(q,r,s). Suppose that U(q) > U(r) so that the 
normative system has removed U(q) - U(r) of uncertainty or 
equivalently gained this amount of information about the index. 
The subject on the other hand has gained an amount U(q) - U(s) 

of information in his move from q to s. Consider now the 

ratio 


: _ U(q) - U(s) 
Gtr. sye= ieee (3) 


If Gq5res) ">" “then the subject has removed more uncertainty 
than the normative move and so can be said to be acting liberally 
or reading too much into the data. If 0O < G(qjr,s) <1 the 
subject is acting conservatively or underusing the data. If 
G(q,r,s) <0 then the subject is increasing the uncertainty 

when he ought to be decreasing it and we can say that he is 
running contrary to the evidence. 


The same kind of argument applies to G(q,r,s) ‘when 
Ug) - U(r)’<"0, ‘and in the*special circumstance when U(q) - 
Wie) =O 'we Set “C(qit, sieht *ol 1 ope te according to 
whether U(q) - U(s) is positive, zero or negative. 
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Feature selection discrepancy. In a number of inferential tasks 
the subject may be faced not only with problems of updating an 
inferential statement on the basis of an observed feature vector 
y, but also that of selecting which feature from a set of alter- 
natives he would like to observe. For example, in diagnosis the 
clinician would almost certainly have to choose which of a number 
of diagnostic tests should be carried out. 


Suppose that from a starting density function p(t) on T 
any one of a set F of features is available. Consider the choice 
fe F. If outcome x is observed and leads to a normative 
posterior assessment p(t |x) then the reduction in uncertainty 
or gain in information is U{p(t)} - U{p(t|x)}. In comparing the 
relative merits of different feature selections we do not know 
the outcome and so, following Lindley (1956), we have to measure 
the merit of f in terms of the expected gain of information for 
f from starting density function p(t): 


H{f, p(t)} = zy [utp (t) } - U{p(t|x)}] p(x). (4) 


The larger this is the more informative the feature is and so 
a normative choice f* € F satisfies 


H{£*, p(t)} = max, H{f, p(t)}. (5) 


Note that £* depends on p(t): what is an optimum choice from 
one p(t) may be poor from some other starting position. If a 
subject, at a declared assessment p(t) and faced with a choice 

of experiment, chooses f then the amount by which the expected 
gain of information for f falls short of the expected gain of 
information for f* gives the measure used by Taylor et al. (1971): 


H{f*, p(t)} - H{f, p(t)}. 


We can, however, more appropriately compare this with the subject's 
worst possible choice Fn) where 


arg p(t)} = min, Hit, p(t), 


to obtain a relative measure of feature selection discrepancy: 


2 G{E*, p(t)} - Gif, p(t)} 
SLE 5 p(t) } =] = Cts, p(t) } we Giiss p(t) } (6) 


The measure S is confined to the range 0<¢«S <I, the 
value 1 corresponding to the worst possible selection and the 
value O to the normative selection. 
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Measures assoctated with normal assessments. For T a finite or 
discrete set, such as in the diagnostic inferential tasks already 
cited, the computations of the measures described above are 
comparatively simple summations. When, as in prognostic, assay 

and calibration studies, T is the real line or even a higher 
dimensional real space then evaluations of she measures for uni- 
variate and multivariate normal assessments and distributions 

can prove useful either in their own right or as approximations. 
Using the notation Ng QAs®) for a d-dimensi2onal normal distribution 


with mean vector 2 and covariance matrix = we have the follow- 
ing results. 


kf] + Qn(2no")} when q(t) is N,Q,97) 
UC) = (7) 
L{d + &n det(27z)} when q(t) is NgQ2)- 
When r(t) is NgQ32) andes Ge) seeks N42) then 
Tika) a= bittace (MCLs tan deci eee ale 
=) 
+ 3s(A-)'2 ~(A-u). (8) 
The simplification for the univariate case when r(t) is 
N, (1,02) and. s(t) ais N, (u,w2) toy 
2 
muro, 2 oles A - 
I(r,s) = ‘al (2) = &n(7) - 1} + Ce : (9) 


Note that the first bracketed part separates out a component of 
the inference discrepancy which measures departure of the sub- 
ject's assessment of the covariance structure from the normative 
value. The second component does not, however, give a pure 
measure of the disagreement in means because of its involvement 
with the subject's variance or covariance assessment w Ors 0. 


Since G(q,r,s) is a simple construction of U values 
there is no need to provide an explicit expression. 


When p(t) | is N(A,x) the form of H{f, p(t)} depends on 
whether in the inferential task it is more appropriate to 


specify p(x|t), say as N(a+Bt,I) ortos i 
. pecif t|x 
N(A + Ax,2). In the first case era | ot 


H{f, p(t)} = fn det(r + T BEB") (10) 
and in the second case 


H{f, p(t)} = kon det(@ 2). (11) 
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4. SEQUENTIAL INFERENCE TASKS 


The measures of performance of Section 3 have been defined 
on the basis of the feature vector being presented as a whole for 
a single inference task. If it is meaningful to present the 
feature vector components one at a time or in successive blocks 
then the subject can be faced with a sequential inference task, 
being required to update his initial assessment q(t) immediate- 
ly after each component Kyo K, has been presented to him 


resulting in successive subject assessments, say s,(t),***,s,(t). 


We can then clearly analyze his performance after each such sub- 
jective assessment. 


Such a sequential performance analysis can take two forms. 
The first is a relative one in which at the ith stage we treat 
the subject's present view s,_, (0) as the starting q, (t) in the 


evaluation of the normative assessment r, (t) against which s(t) 


is to be judged. Secordly, there is an accumulative or absolute 
performance analysis waich in the normative updating to obtain 
r, (t) after the ith stage uses as starting assessment the 


previous normative updating r,_1() rather than the subject's 
q, (t). Which is the more appropriate will depend to some extent 


on the nature of the particular inferential task. On the whole we 
prefer the relative analysis because it builds successively on the 
subject's immediately held belief and so has a greater opportunity 
of identifying particular circumstances in which discrepancies 
from the normative occur. 


The definition of feature selection discrepancy lends itself 
easily to sequential inference tasks. At each stage of a se- 
quential inference task, instead of presenting the subject with 
the next component, we may ask him to choose which of the compon- 
ents not so far revealed is likely to clarify the uncertainty 
most. At the ith stage, with his current subjective assessment 
at s,(t), if he chooses feature f. from the set F of 


features available then replacement of f, F and p(t) by 


fi» Fy and s, (t) respectively in the definition of S at 


(6) produces the appropriate features selection discrepancy 
for the ith stage. 


5. SOME SPECIFIC INFERENTIAL TASKS 


In this section we refer to some inferential tasks already 
partially reported in the literature, describe briefly some others 
which we have tested in pilot studies, and indicate some other 
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applied areas which may be worthy of investigation. In particu- 
lar, we shall draw attention to one or two interesting and 
unexpected aspects of the associated performance analysis which 
we consider to call for further more careful investigation. 


Diagnostic inferential tasks. A number of such tasks have been 
reported by the author and colleagues so that there is no need 
to give more than the relevant references and a few brief comments. 
First, we emphasize that in these tasks diagnosis is presented 
as an inference rather than a decision problem, the subject 
being required, for a sequence of patients, to assign probabili- 
ties to the possible disease types on the basis of patient 
information released to him either sequentially or as a whole. 
For a valid performance analysis it is necessary to know exactly 
what information about a case is known to the subject. It is 
therefore not possible to allow the subject to see the patient 
lest he collects visual or other information unknown to the 
analyst and so information must be supplied verbally or possibly 
at some visual display unit. To the extent that there is no 
contact with the patient it could be claimed that such studies 
do not put clinician subjects into their natural inference- 
making setting but most subjects regard the tasks presented as 
fair tests of diagnostic skills. Moreover when interest is in 
comparing the inferential skills of clinicians with those of 
other professions direct access to patients is clearly not 
possible. 


Performance analysis studies of diagnostic inference differ 
in a number of respects: 


(Gis) the extent to which the experience of the subject in 
the particular diagnostic area has already been 
acquired and so is not determinable or can be 
completely supplied by the analyst (see the previous 
comments in Section 2); 

(ii) the extent to which the information on a new case 
can be supplied sequentially; 

(iii) in a sequential task the extent to which the choice 
of the next feature is required of the subject ; 


(iv) the extent to which any assumptions of the normative 
assessments are valid. 


An early study in this area (Taylor et al., 1971) was con- 
cerned with differential diagnosis of non-toxic goitre, depended 
on the self-attained experience of six clinicians in a specialist 
clinic, was sequential in nature with the clinician at each 
stage choosing the next feature. Each of the features was 
categorical, with either two or three categories, and for the 
normative model the p(x|t) probabilities were assessed on the 
basis of data on about 50 cases for each of three types and on 
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the assumption that for given type features are independent. To 
the extent that no allowance was made for sampling error and that 
the independence assumption was undoubtedly suspect this study 
may be criticized. Nevertheless as an expository aid for infer- 
ential principles in diagnosis, particularly with its strong 
visual appeal in the presentation of sequential diagnostic 
inference as paths within a triangle or triangular bowl of 
uncertainty, it has proved invaluable. One of the awkward 
features in this example is undoubtedly the difficulty in obtain- 
ing simple models for multivariate categorical data, though with 
greater familiarity of, and availability of programs for, log- 
linear analyses and kernel density function estimation, more 
appropriate normative models could be devised. 


This difficulty of modelling the dependence of the features 
does not arise when the features are quantitative and when the 
feature vectors, or possibly transformed feature vectors, for 
given type, follow a multivariate normal form, since for example 
in sequential inference conditional dependence can be easily 
expressed and programmed. Details of these technical aspects 
may be obtained in Aitchison and Kay (1975) and Kay (1976). 
Aitchison and Kay (1973) presented a non-sequential differential 
diagnostic problem as a competition. ‘Experience’ took the form 
of details of the six-dimensional features of twelve cases of 
each of three types, this training set and the subsequent 
cases being generated by multivariate normal simulations. Another 
useful instructive inferential task is another simulative situa- 
tion variously presented as Doctor's Trilemma or differential 
diagnosis of three forms of Newmath Syndrome. In this 
‘experience’ consists of informing the subjects in simple terms 
that the ten features are, for given type, independent binary 
and each subject has the thirty binary probabilities in front 
of him as he updates in the diagnostic triangle as the binary 
features of the case are revealed to him sequentially. For 
further details, see Aitchison and Kay (1973) and Aitchison 
(1974). 


It is clear that any real or simulated task in this diag- 
nostic area can be easily presented, the only constraint being 
that for real diagnostic tasks we have available an appropriate 
and sufficient training set on which to base normative assess- 
ments. One aspect of normative assessments worth noting is 
that predictive rather than estimative assessments in the sense 
defined by Aitchison et al. (1977) are to be recommended. 


Predictive and prognostte inferenttal tasks. Since we shall be 
describing below a calibrative inferential task which calls for 

a density-function type assessment similar to those required here 
we shall confine ourselves to the bare outline of such tasks. 
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Suppose that we give a subject who is aware of the concept 
of the normal density function a set of observations from a normal 
distribution set out on a horizontal scale and then pose him the 
following task. Another observation is about to be recorded. 

Can the subject draw a pattern of plausibilities, essentially an 
unscaled density function, on the horizontal scale given, whose 
heights show his subjective assessments of the relative plausi- 
bilities of the various possible values? Txis requires a com- 
posite assessment which we can convert to proper density function 
form s(t) to be compared against a normative assessment r(t)s 
One point worth noting here is that the usually fitted normal 
curve with sample mean and standard deviation substituted is not 
an appropriate normative assessment and is etter replaced by 

a Student-type density function for the reasons given in Aitchi- 
son (1975). 


More complicated inferential tasks here involve regression- 
type situations. In such a task the set T will usually be the 
real line, the set of possible response or dependent variables, 
while X is one- or higher-dimensional and consisting of 
possible explanatory, concomitant or independent variables. A 
typical simple inferential task in this area is to provide the 
subject with a regression-type scatter diagram with x-axis 
horizontal and t-axis vertical, then to ask the subject, after 
suitable explanation of the meaning of the task, to provide a 
density function s(t) or 'pattern of plausibility' for the 
possible t values corresponding to a given x. 


Calibrative inferential tasks. The type of task here is best 
described in terms of a specific simple exanple which we have 
given to a variety of subjects with some very interesting 
results. For this task each subject receives a copy of Figure 1 
which is the training set, data for the 'standard curve' for 

an assay or calibration, and the background to the problem is 
explained to the subjects. The problem concerns the assay of 
the concentration t of an antibiotic in a patient's blood. 
Droplets of standard preparations of known concentrations t 


of the antibiotic are placed on a prepared infected medium on 
Petri dishes and, after cooking for 24 hours, the diameters x, 
i 


(mm) of the circles cleared, which are of course related to the 
concentration but in a statistical rather than a deterministic 
way, are recorded. In Figure 1 these (t,>x;) points are 

i 


plotted. The subject is then made aware that the problem is to 
try and infer something about the unknown concentration of anti- 
biotic in the patient's blood from knowledge only of the 
diameter of the clearance circle from a Single droplet. He is 
invited to make use of the numbered patterns of variability 
shown in Figure 2 and supplied to him on a "Yansparent sheet, 
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DIAMETER (mm) OF CLEARANCE CIRCLE 
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ANTIBIOTIC CONCENTRATION 


TASK NO. 


|. MOST PLAUSIBLE STIMULUS toad 


CURVE NO. lane 


2. MOST PLAUSIBLE STIMULUS = 


CURVE NO. Bai 


FIG. 1: Seattergram of (antibiotte concentration, clearance 
diameters) for standards in calibratton task. 
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FIG. 2: Patterns of vartabilit 
ecaltbratton task. 
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y for subjective inference in 
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to place what he regards as an appropriate pattern on the hori- 
zontal t-axis. The meaning of such patterns is explained to him 
in some detail, for example, 


(i) that the mode of the pattern selected should 
naturally be placed above the concentration re- 
garded as most plausible; 

(ii) that with such patterns the relative heights of the 
curve above any two concentrations should reflect 
the subject's view of the relative plausibilities 
of these concentrations; 

(iii) that the 'narrower' the pattern chosen the more 
precise the subject is regarding the method of 
assay; 

(iv) that, since only a finite number of patterns can be 
provided, he is free to choose an in-between pattern 
by 'interpolation", recording his choice to one 
decimal place; for example, pattern 4.3 is inter- 
mediate between pattern 4 and pattern 5 but nearer 
to 4 than 5. 


At the outset the subject is told that all concentrations 
are equally likely. In our studies two tasks are given. First 
the subject is told that the diameter from a single droplet of 
the patient's blood was 19 mm. and is ask to identify his pattern 
by writing down the most plausible value and the pattern curve 
number. In the second he is told three diameters of 18.5, 

18.5 and 20 mm. Again he is asked to select, on the basis of 
this information, his pattern by again noting his chosen most 
plausible value and pattern curve number. The assessments of 
performance can, of course, be easily quantified by comparison 

of the selected curve s(t) with a normative curve r(t), such 
as the calibrative density function of Aitchison and Dunsmore 
(1976, Chapter 10) or a normal approximation to it. The measures 
(8)-(11) are then appropriate. 


One interesting and surprising feature of the results is 
that in each of a number of different groups - statistical stu- 
dents in various years, clinicians, physicists - approximately 
one half choose a wider pattern in the second task than in the 
first, contrary to the common sense view that more experimentation 
should provide a more precise inferential statement. It is 
clearly a phenomenon which is well worth further investigation. 
One possible explanation is that with the single diameter some 
subjects have a tendency to forget about or underestimate the 
variability in diameter for given dose, whereas when they are 
presented with three diameters showing variability they take 
account of this variability. 
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Further more thorough and more extensive inferential task 
studies are being prepared particularly in this area of calibra- 
tion. For example, it will be interesting to discover how 
skillful subjects are in choosing between different response 
variables - for example, do they appreciate the role played by the 
slope of the response line and the variability about the line? 

It is even possible to present inferential tasks involving both 
calibrative and diagnostic skills. 


6. DISTRIBUTIONS OF INFERENTIAL STATEMENTS 


When studying a single subject performing differential 
tasks or a number of different subjects performing the same 
task we will be faced with a set of inferential statements 
S279 ° 98> each a probability distribution over the set T. 


For the statistical analysis of such data it is clearly an 
advantage to consider distributions of. probability distributions 
over T. Consider first the situation when T is finite (d+l)- 
dimensional. Then S90 9S, are probabilistic data in the 


sense that they can be represented by points in the d-dimensional 
simplex 


d 

So mite Fg Ieshp. ci pte BS agg) 2 ay eee u, < 1}. 

Figure 3 shows the inferential statements of 56 statisticians, 
undertaking the same inferential task in a diagnostic competition 
at a conference in multivariate analysis; Figure 4 shows the 
corresponding statements of 11 second-year undergraduates. 


Distributions over the simplex have been recently discussed 
by Aitchison and Shen (1980) and Aitchison (1981), who advocate 
the class of logistic-normal distributions as a richer, more 
statistically tractable class than the more familiar Dirichlet 
Clase. Sif. wee Sd then vi &n(u,/us4) (i = 1,°*:,d), where 


Us4d =l]- u, = cee = ua? defines a vector v in Ro, 1 gw: 


follows a Nj 2) distribution then wu is said to follow a 
corresponding logistic-normal distribution L442) over s@ 


The term logistic-normal is used since u is related to v by 
the logistic transformation 


Vv; Vv v 
ns xt tess) exe d ‘ 
+ wae /(l+e ++e+e+e %) (i = lyeseyd). 


There are, indeed, some grounds for expecting that the 
pattern of variability of inferential statements may follow 
logistic-normal distributions. If, for given t, the distribu- 
tion of the components Hypo eK of the feature vector are 


we ae? 
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FIG. 3: Diagnostic inferences of 56 stattsttctans. 


3 


FIG. 4: Diagnostie inferences of 11 second-year undergraduates. 
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independently distributed with density functions p(x; |t) then 


from prior probabilities p(t) (t € T) we have, by Bayes's 
formula, 


p(t, |x) p(t,) k p, (x, |t,) 
= = ——— gv : 
Ying oe P(t a4, |) 2O/ p Cet aa a P, CACY 


Sat 


Provided that the distributions p, (x, /¢) satisfy sufficient 


regularity conditions for the central limit theorem to be valid 
and that k is a reasonable size, then the M, > being sums, will 


tend to be jointly normally distributed, and so the true infer- 
ential statements will follow logistic-normal variability. Of 
course in practice the independence assumption will not be valid, 
so that logistic-normality of true inferential statements would 
have to rely on a central limit property for dependent sums. 

But we suspect that subjects have considerable difficulty in 
taking account of dependence in their assessments so that if 
their subjective process does correspond to some rough and 

ready form of Bayes's formula it is likely to be in an approxi- 
mately independent form. All this ie speculative and it seems 
doubtful whether the subjective process can ever be investigated 
in this amount of detail. But at least there is a prima facie 
case for investigating logistic-normality of distributions of 
inferential statements of finite T. 


For T non-finite, such as the real line in the calibration 
problem, the description of the variability of inferential state- 
ments, now probability distributions over the real line, is much 
more difficult. If the task takes the form of the selection of 
a normal curve then an inferential statement is equivalent to 
(m, s), where m and sg are the mean and standard deviation of 
the selected normal curve. For this task we then have to select 
some suitable joint distributions for (m,s) and it is possible 
that some normal-Wishart form may be appropriate. 


7. TWO STUDIES INVOLVING INFERENTIAL TASKS 


We now illustrate the application of logistic-normal analysis 


to the inferential statements of groups of subjects, performing 
diagnostic tasks. 


7.1 Doctor's Trilemma. The subjects were 48 first-course sta- 
tistics students and each was presented with four tasks, the four 
cases requiring a diagnosis between three types with information 
from the ten binomial tests presented sequentially. For details 
of the training set and cases presented see Aitchison (1974). 
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Each of the 24 possible orders of presentation of the cases 

was allocated to two students, the allocation being at random. 
The trial was conducted in two sessions. At the first session, 
early in the course and before students had met the appropriate 
technical tool of Bayes's formula each student tackled his first 
two cases. The remaining two cases were presented at the 
second session, some six weeks later and after meeting Bayes's 
formula in lectures. Subjects were not informed that Bayes's 
formula was the appropriate tool and were not allowed to write 
anything on paper except their inferential statements. 


Kay (1976) has analyzed the various measures of performance 
of Section 3 separately for this study. Here the analysis applies 
to the complete or inferential statement and not just single 


aspects. If, S45 denotes the inferential statement of the ith 


subject on the jth task presented to him then a model 2 for 
the analysis of the various possible effects is as follows: 


where K(i,j) = 1 or 2 according as the case comes before or 
after knowledge of Bayes's formula. The usual form of identi- 
fiability restrictions apply: 


4 
) a, = 0, yh B, = 0, V1 + y= Of 


Here the O, and 8, denote subject and task effects and non- 


zero values of Yy and Yo will indicate some effect associated 


with knowledge of Bayes's formula. 


For testing any hypothesis w within the model { the 
usual chi-squared approximation at significance level a form 
of the generalized likelihood ratio test can be expressed in the 
form 


det x 2 
192 log ——~x— > x (ra), 
det Zo 
where xO and Xo are the maximum likelihood estimates of 2% 


under w and , r is the number of independent constrajnts 
on the parameters required to specialize to w and x (r,a) 
is the upper a point of the chi-squared distribution with r 
degrees of freedom. Figure 5 gives the complete lattice of hy- 
potheses with the test quantities and their critical values. 
Moving down the lattice we can reject all the hypotheses except 
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Random 


@ te Le O BE te .@) 154 (48) 
Bayes s Subject a= 5.20 
effect effect Task 

effect 
292 (3 151 (47) 

B\=9 a a, =0 By=0 5.12 (2) 
Bayes's + : Bayes's + Subject + 
subject effect task effect task effect 


General model 


FIG. 5: Latttee of hypotheses assoctated wtth Doctor's Trilemma 
study. At each node the appropriate value of the Chi-squared 
test statistic ts shown with, tn brackets, the associated number 
of degrees of freedom. 
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the one marked by the asterisk. Thus we must conclude from this 
study that there are significant subject and task effects but 
that there is no evidence of a 'Bayes's effect' on subjective 
performance, though the shortfall of the test quantity 5.12 

from the critical value 5.99 is perhaps small enough to encourage 
the undertaking of more and larger studies. 


7.2 Statistictan's Syndrome. In this analysis three groups of 
subjects, 56 professional statisticians, 11 second-year statistics 
students and 9 clinical consultants, were each presented with 
five tasks involving the differential diagnosis of three types 
based on six quantitative features. The complete simulated train- 
ing set of 36 cases, 12 of each type was given, together with 
information that for each of these equally prevalent types the 
distributions of the features were normally distributed, the 
three mean vectors and covariance matrices also being given to 
the subjects. These details of the training set and the five new 
cases are to be found in Aitchison and Kay (1973). For each 
group there are highly significant subject and task effects. 

The extensive subject variability can be easily illustrated by 
Figure 3 which shows in terms of triangular coordinates the 
inferential statement of the 56 statisticians for one of the cases. 
If, for a given task,group g has inferential statements which 
are Et ate ae distributed then interest is in testing such 
hypotheses as aT = Xo = a and Wy = Uy = Hy: We have made 
such comparisons between our three groups for each task with the 
following results. For only one of the tasks is there no sig- 
nificant difference between the groups. For all the remaining 
four tasks there are significant differences between the mean 
vectors, highly significant at 0.1 per cent for three of these 
tasks, though for only one task is there a significant difference, 
at 5 per cent, between the covariance matrices. 


Thus there is evidence, not only that subjects within a 
group vary but that there can be significant differences between 
the performance of different groups' subjects. Since a normative 
statistical system, such as predictive statistical diagnosis can 
for each task supply a normative inferential statement translat- 
able into logratio value wu we can test whether a group's ie 


is significantly different from UU. For all three groups and 
all five tasks there are significant differences of group means 
from the normative value. 
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EVERY BODY HAS ITS MOMENTS 
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SUMMARY. The distributions of ray and secant lengths through 
arbitrary convex bodies for five different randomness measures has 
previously been obtained. This article concentrates on the ray 
and secant moment relationships that exist among these various 
measures. In particular various moments depend only on the volume 
and surface concent of the n-dimensional convex body. Specific 
moment relationships have been obtained for regular convex bodies 
which include all n-dimensional regular polyhedra. The result 
will be illustrated by applications. 


KEY WORDS. Geometrical probability, convexity, random secants. 


1. INTRODUCTION 


Random paths through convex bodies have attracted consider- 
able interest recently. An introduction to the subject can be 
found in Kendall and Moran (1963) with an updating by Moran (1966, 
1969). More recent work can be found in Coleman (1969), Miles 
(1969) and Kingman (1969). Extensive references to applications 
may be found in Kellerer (1971). 


This article will deal with moment relationships of rays and 
secants that occur under various types of randomness. The 
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formalism will be an extension of that developed in Enns and 
Ehlers (1978, 1980). 


A secant is a straight line segment traversing a convex body 
K. A ray is a line segment originating within the body K and 
terminating on its surface. The lengths of these rays and secants 
is now determined by their generation mechanism. Five different 
methods of generation will be considered, namely: 


v-randomess: A point within K and a direction, each with 
independent uniform distributions, can be used 
to define a ray and a secant. 


u-randomness: A secant is defined by a direction @ and by its 
intersection P with an (n-1)-dimensional hyper- 
plane through some fixed origin and normal to 6. 
The point P and direction 6 have independent 
uniform distributions. 


A-randomess: Two points are chosen independently within K, 
each with a uniform distribution. These points 
define three random lengths, namely a ray, a 
secant and the distance betwen the two points. 


a-randommess: Two points are selected independently, one on the 
surface of K and the other inside K. Both 
points are obtained from uniform distributions 
over their respective domains. These points 
define a ray and a secant. 


Y-randomness: A point on the surface of K and a direction into 
K, each with independent uniform distributions 
can define a secant of K. 


The distributions of these ray and secant lengths have been 


obtained in the above references and are summarized in Appendix 
Be 


It will be shown that for a special class of convex bodies, 
which we will call regular, moments of ray and secant lengths can 
be related for all five types of randomness. This could be 
useful in determining the type of randomness which has generated 


a sample of secants which can be measured. This will be illus- 
trated in the applications. 


Every body has its moments! If we've experienced the right 
moments, how much do we know about the body? This question, 
too, is answered in our applications. 
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2. OVERLAP VOLUMES AND SURFACES 


The distributions in this paper (see Appendix I) have been 
obtained from or are written in terms of overlap volumes and 
surfaces of the convex region under consideration. Let S(-) 
and V(-) represent the surface content and volume of (-) 
respectively. Let K(%, 8) represent the convex region K 
translated a distance 2% in direction 9. If E, (+) denotes 


the mean of (+) when uniformly averaged over direction, then 
define: 


(2) 


E,(V(K N K(2,8))]/V(K) 


w(2) 


E,[S(K N K(2,8))]/S(K). 


These represent the normalized average overlap volume and 
surface content respectively of K with its translated self. 


If K' is a shell of uniform width h about K, then in 
Enns and Ehlers (1980), it was found that 


RCL) = Lim [Myer (2) - 2 (2)1/h 


neo BY 
me 169) 
= 8D a(t) - 9 (2)1. (1) 


The subscript K will be dropped whenever there is no ambiguity 
about the region under consideration. 


From the form of the distributions in Appendix I, it is 
evident that one can obtain two sets of moment relationships. 
One set relates Vv, and A _ randomness moments, the second 
set Q@ and y randomness moments. Equation (1) will be used 
to relate these two sets. 


3. MOMENT RELATIONSHIPS FOR ARBITRARY K 


Let E(x") denote the expected value of the mth moment of 


the random variable X under r_ randomness where 
¥ € vet.) Cay cand: XG [R12 T] o represents’ either’ a ray, 
secant or segment respectively. 


The distributions of R under \y-randomness and L_ under 
y-randomness yield the following moment relationships: 
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mf a2 ony an (2) 
0 


m 
E(R) 


arin adore (Neal ® ec (3) 
0 


m 
ee ) 


Moments of all other random variables can be expressed in 
terms of the above basic moments. The two sets of relationships, 
where "n' represents the dimension throughout, are given by: 


ie sala 9 


Eve n(mentl) C_ V(K) 


Vv 
I 


= (u™™) /(mtnt1) = 


5 (i) (nt1) V(K) 


m —_ 
E,(R ) V(K)/C, = 


C_ (ntmt1) 
E,(T") (ntm) V(K) 
week eS 2 See (4) 
n Cs 
and Eada) = SaEee a E(R") = 2 V(K) E (L")/c.. (5) 


The question now becomes, can these two sets of moment 
relationships be connected. A special class of convex bodies 
will be considered in Section 4 where this connection can be 
explicitly obtained. 


38.1 Moments Which Depend Only on the Surface Content and Volume 


of K. The following results have appeared in Enns and Ehlers 
(1978, 1980): 


oo C S(K) 
" om ' bens GF 
ye ON(2) dt = -2'(0) = ae CaCO 
f ah ony ae = 2 pa) aay ag = WH 
0 ey oT 0 n 


Combining these results with (2), (3), (4) and (5), one 
obtains the following moment relationships which depend only on 
the volume and surface content of kK: 


WW 
i] 


-n,\-1 
ER") = (ECR) = W(R)/C, (6) 


=ilk 


ECL") = (EE ™)) = (ntl) VK)/C, (7) 
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E(L") ~ (eg tener = 2 V(K)/¢_ (8) 
7 ania ta Co V(K) 
E () = EL ») = CaF SEH), (9) 


pnt nary (Ut) 


-1 
y= (Ea ay = : 
C1 S(K) 


E At (10) 


Equations (7), (9) and (10) have also appeared explicitly 
or implicitly in Hadwiger (1950), Kingman (1969) and Miles (1969). 


4. REGULAR CONVEX BODIES 


A regular convex body K is defined as a region whose sur- 
face content can be obtained by differentiating the equation of 
the volume with respect to a single variable. Examples include 
the n-sphere, n-cube, rhombus, regular polygons and regular poly- 
hedra of all dimensions. The definition also includes more 
complicated regions such as a cylinder with a hemispherical cap 
on both ends, namely a sausage. 


If K is regular, then V(K) can be expressed in terms of 
a variable b, such that d V(K)/db = S(K). In this case the 
volume of KUK' is V(K) with b*>b+h. Consequently, for 
regular K, (1) may be written as 


2(2) = d 2(2)/db = [w(%) - 2(2)]S(K)/V(R) . (11) 


Multiplying (2) by aie and integrating over 2%, one obtains 
the following differential equation relating moments: 


ap fy (R" ) = [E ny ) - E., (R™ yer ric 5 en V(K). (12) 


If we let = = (A + 1) E(R"), then (12) becomes: 


d m 
ap Ey 6 )=A E,, (R™ ) = Qn V(R). (13) 


Clearly A must be a dimensionless quantity. Thus, either 
A is a function of "n" and "m" only, or it additionally 
depends in a dimensionless manner on the length b. The latter 
is possible only if the volume of K is a function of two or 
more lengths. An example of this occurs when K is a sausage 
of radius b and length h. The surface area can be obtained by 
differentiating the volume with respect to b only, hence the 


~ 
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sausage is a regular convex body. In this case A could be a 
function of the dimensionless quantity (b/h). 


Tf  Vik)P eis) a function of the length b only, then of 
necessity A cannot be a function of b. In this case the 
solution of (13) is: 


E(R™) = c(v(K))”. 


In order that the dimensionality of this expression be 
correct, we must have A= m/n. Hence for regular convex bodies 
defined by a single dimensional quantity: 


m n+m m 
ee )= Song E(R ) (14) 


The corresponding relation between secants under Vv and y 
randomness is: 
m n(m+1) m 
= = E(L). 
Ey ) n+m aa ) 


Ehlers (1972) discovered this relationship for the square and 
n-sphere. Coleman (1973) gives it for the square and cube. It 


should be noted that (6) and (8) imply that (14) is true for any 
convex body when m=n (n = dimension). 


4.1 Further Regularity Condittons. The previous section can be 
extended by considering regions K where more than one dimen- 
sional parameter is necessary. An example is a rectangle which 
is defined by a length and a width. We will restrict ourselves 
to regions where the volume V(K) can be expressed in terms of 
the variables bi» b res namely 


2? 
Y= elele 
V(K) f(b); bo» »b,.) 
such that 
_ -V(KUK')-V(K E> SOLD, Er ed 
av | = >} = SS = SiEye 
h>0 i=1 it 


Regions so defined include the rectangle but not the ellipse. 
Relation (1) can now be written as: 


y 9Q(2) _ S{K 
‘Wee: StEY [w(2) - 2(2)1. (15) 


For example, the rectangle 2b, x 2bo> leads to 
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ae ne 
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Dain Bact bea 


w(2) = Q(L) + (#5) Sb, ab, 


The Y-random moments of secant length may thus be obtained 
directly from the v-random moments as 


b >> 
[nx— E (RX) + Ras: er, 
b+, ab. Vv 


ee m 
E Au )= E(R ) + 


which is easily seen to be consistent with Coleman's (1973) 
results. Similar results hold for the n-dimensional box and, 
more generally, for the n-dimensional rhombus. 


5. APPLICATIONS 


Kellerer (1971) lists extensive references of the applica- 
tions of random chord length transversals of convex bodies. The 
problems are broadly of three types. 


1. If the randomness is known and the type of body is known 
(i.e. triangle, ellipsoid, etc.) but the size of the body 
is unknown, then this can be estimated via the moments. This 
is self-evident and will not be elaborated upon. 


2. If the size and shape of the convex body is known, but the 
type of randomness is unknown, then this can be deduced in 
most cases from the secant length data. This is illustrated 
in Section 5.1 for the triangle. 


3. If nothing is known about the body, then Section 3.1 can be 
used to deduce the volume and surface area of the body 
provided one can generate a suitable type of randomness. 

One method whereby this can be achieved is if K is imbedded 
in a spherical container K' with a radius much larger than 
the maximum diameter of K. Coleman (1969) has shown that 
Y-random secants generated through K' become wu-random 
secants within K. If the u-random secants can be measured 
then using (9) and (10) one can deduce the volume and surface 


? = 4 
area of K. 7 ‘In particular “if~ \ "and 2% ~ are’ the observed 
u-secant first and fourth moments, then the volume and sur- 
face area in 3-dimensionsl can be estimated by: 


v(k) = 7 24/03) and S(K) = 47 2°/(3(@2)’). 


~~ 
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5.1 Deducing the Randomness of Secants Through an Equilateral 
Triangle. Appendix II states the new results (2), wh) and 
the y-randomness moments for the equilateral triangle. Via 
Sections 3 and 4 all other randomness moments may be obtained. 
The means and variances are: 


E,(L) = 0.5245 h var, (L) = 0.0924 te 
E,(L) = 0.5236 h var, (L) = 0.0920 a 
E(L) = 0.6994 h var, (L) = 0.0622 ic 
E(L) = 0.7898 h var, (L) = 0.0430 i 
E,(L) = 0.8425 h var, (L) = 0.0310 he 


By examining a sample of secants with unknown randomness, 
one could apply the standard classification procedures, such as 
in Anderson (1958) to determine the appropriate randomness. 
Obviously it will be difficult to distinguish between Yy and 
u-randomness for the triangle. 
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APPENDIX I 


RAY AND SECANT LENGTH PROBABILITY DENSITY FUNCTIONS 
FOR ARBITRARY kK. 


Let R, L and T _ be random variables representing the ray 


length, secant length and the distance between two points res-— 


pectively. Let the density functions be written as fy.) 
E) 


where X € [R, L, T] and r. is the randomness, namely 
r € [v, Hu, A, a, Y]. 
fe (2) = -Q'(L) 


Lena Chew 2" (L) 
£. (2) = -2"(2)/2" (0) 


=n C_ V(K) 2"(2)/[C__, S(K)] 
fe N or 
Ea.) = 6, 9! (RY/VER) 
ities 
£,,,08) = 6, 27 am(Ay/[(mt1) VOD] 
£5 (6) =n 0. gt oe) /v(K) 


Chee Co IVR) 
Qa n 


R; 

n ' 
feat” = - Co 2 w'(L)/2V(K) 
Fryaghh = - w' (2) 


where primes denote differentiation with respect to 2 and Co 


is the volume of unit n-sphere. 
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APPENDIX II 


The overlap area and circumference of an equilateral triangle 
of height h are respectively: 


tee (x Ya) 2 ae! 0 1 
3 AS). Sr Bt i fe) aG8 < ops 
2 -l1 
Tag) =( 2 - 2+ E oe } p* + 3vp2-1 - (2#p°) sec p 
3 3 6 4 
4£ 1.8 Oi ee cies 

5 fe) tt) OF< Oo el 
T — 
3 w() = 

. Ena ov%p2-1 2 ot scape Se A 


where 0 = £/h. The y-randomness moments are: 


Tee ean 
EL }=h A / (m1) 


where AL = 3(2n3)/T, Ay = 273/1 
Be Suey iesat ANCeT aA oe 20/(3¥73 1) 
_~k, 42 ofa a eee aes 
A+? k+l = 1 (k+1) 43 - ee tens 
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SOME DISTRIBUTIONS IN THE THEORY OF GRAPHS 


MICHAEL CAPOBIANCO 
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SUMMARY. In certain types of populations, one is interested in 
the relationships among the individuals, e.g., animals and plants 
in a food web, persons in a social group, cities linked by 
various transportation routes, etc. These can be modeled by 
graphs or digraphs (directed graphs). Trying to infer something 
about these structures from a subgraph (sample) constitutes 
statistical inference in graphs (what we have called "stagra- 
phics"). There are many distributions which occur in graph 
theory which could be useful in inference. This paper is a 
survey of some of these distributions, specifically, degree dis- 
tributions, path length distributions, distance distributions, 
cycle distributions, triad distributions, and distributions 
connected with pairs of stars. Various properties of these 
distributions will be discussed, as will their connections with 
inference. Several open problems will be presented. 


KEY WORDS. graphs, digraphs, statistics, distributions. 


1. INTRODUCTION 


1.1 Graphs. A graph G, is non-empty set, V, of elements we 
call the potnts or vertices of G, together with a set, E, of 
unordered distinct pairs of distinct points of G. The elements 
Of, Ey ate calledi/ines ot edges.of .G.,.1f; v, w€ V,..and 

(v,w) € E, then we say that v and w are adjacent. If we 
denote the lines (v,w) by e, say, then we say that v_ and 
w are inctdent with e. The cardinality of the set of vertices, 
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lvl, denoted by N, is called the order of the graph, and El, 
denoted by R, is called the stze of the graph. 


Graphs can be represented by diagrams in which the points 
are dots, and adjacencies are indicated by drawing line segments 
between two dots representing adjacent points. Figure 1 shows a 
representation of the graph V ={1,2,3,455,6,7i5 B= 162), 
(456) oe (Aa), a6). G7) 


S 6 
6 1 
3 3 
" 5 vA 
2 2 
7 Tian * 
i Hoy, 
Fig. t Fig. 2 


They can also be represented by certain (0,1)-matrices. We 
define an adjacency matrix of a graph G _ by labeling the points 
with integers from 1 to N as, for example, in Figure 1. The 
NXN adjacency matrix A = fa; ;] has all its entries equal to 


zero except that ahs = 1 if points labeled i and j are 


adjacent in G. Thus for the graph represented in Figure 1, the 
adjacency matrix is 


ee ee Oe ee 6 Mi: 


— OO OF SG I ee 
o 2. & Co oe ae 
oS! O77 OOS oes 
Oo Co SF oo es 
ee oO Oo © 707 co 
qQ- — So) o> Geos 
eo KF FF & Oo. U6S 
oO - of FE (onoRS 


If we label the lines as well as the points, say as in 


Figure 2, we can define the RXN inetdence matrix B = [b,.] whose 
oh 


entries are bt = 1" if point abetted. 4 


labeled i in G. Thus, 


is incident with line 


B for the graph of Figure 2 is 
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a ae aber ae a Sa re ag ay a 
LE), O Tis QeerG: BOs ¢ Ibias Verio 
Cae WA Oe eee |) 
Series 1 eed DmaoNdag 0 
LEST SN Tee ae Ne eae 0 aa 8 
SAPO sOS7A0 Veh Saks 5 dh 


1.2 Dtgraphs. A digrcph (directed graph), D, is a non-empty 
set, V, of elements called points or vertices of D, together 
with a set, E, of orcered distinct pairs of distinct points of 
D. The elements of E are called lines or ares of D. If 

v, w€V, and (v,w) € E, then we say that v is adjacent to 
w (or w is adjacent from v). |v| and |£E| are called the 
order and size of D end are denoted N and R, respectively. 
Digraphs can be represented by diagrams in which points are dots, 
and arcs are arrows. For example, the digraph . 

V= 11512 58,455;,6, 758), Bix LGW I2):; k 253)" (3,4), (4.1), (1,4), 
(8,7), (3,5)} is shown in Figure 3. 


2 3 ty 
hae arerke ots 
; + J lb ® 7 
Fig. 3 


The adjacency matrix A= la, ,] for a digraph is NXN and 
has all entries O exeept that aj. = 1, AL-point 4 sis adjacent to 


point j in D. Thus, for the digraph of Figure 3, 


eo ado £. Sharh io Pas 
feeb ed bp fOr (bOdr (4 
Pras 8 ke 8 of 
Resth InQeredi ad Ives0 ne Oxo0 
ELowialayie desi in! OOO 0 
Fae sidi Sth Ohh<fhocD: od *oiOsl olf 90,5 0 
69 HabaiOr ed 9060.00! Orely0 {10 
Pet he oe Oe OO 
SleiGde then -p.Gx C2008 arot6 


ee 
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1.3 The Purpose of This Paper. In certain problems, we are 
interested in the relations among individuals in a population, 
e.g., the food web of an animal and plant community, a social 
group where the relation could be acquaintance, the hierarchical 
structure of authority in an organization, a communication or 
transportation network, etc. Such populations, referred to by us 
as "populations with structure" (Capobianco, 1970) can be effec- 
tively modeled by graphs and digraphs. The points represent the 
individuals, and the lines or arcs display the relations. 


Our main research interest is in statistical inference in 
graphs, or "stagraphics", which is the word we coined for it, 
i.e., inferring something about the graph (structured population): 
by observing a sample (a subgraph) from it. See Frank (1980b,c) 
for surveys of this subject. 0. Frank is one of the originators 
of work in this field. It was in connection with problems of 
this type that we came across several properties of graphs which 
can be thought of as distributions, and it is to these that 
this working paper is devoted. It is intended as a survey, 
which, it is hoped, will stimulate further research in this 
area. Hence, few proofs will be given, but many concepts are 
explained and illustrated, several theorems are stated, and some 
unsolved problems are presented. 


2. DEGREE DISTRIBUTIONS 


2.1 Degrees. The degree of a point, v, ina graph, denoted 
d(v), is the number of lines incident with v. For example, in 
the graph of Figure 1, d(1) = 0, d(2) = d(3) = d(5) = 1, 

d(4) = d(7) = 2, and d(6) = 3. In a digraph, any point v_ has 
an tndegree and an outdegree. These are denoted id(v) and 
od(v), respectively, and are defined as the number of vertices 
adjacent to v, and the number of vertices adjacent from v. 


The (total) degree, td(v), of a point v is given by 
td(v) = id(v) + od(v). (1) 
For the digraph of Figure 3, we have id(1) = id(2) = id(3) 


id(5) = id(7)-= 1, id(4) = 2, 1d(6) = id(€8) = 0, od€1) = ods) 
2, od(2) = od(4) = 0d(8) = 1, 0d(5) = 0d(7) = 0d(6) = O. 


ll 


i. We present next an easy, but important, result known as the 
first theorem of graph theory", although its corollary sometimes 
goes by that name. We include the proofs since they are not 


difficult, and will hlep introduce the reader to the "flavor" of 
the subject. 


—————— si 
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Theorem 2.1. For any graph G, 

) d(v) = 2R (2) 

v 


where the sum is taken over all the vertices of G, and R, as 
usual, denotes the size of G. 


Proof. One of the quickest proofs is obtained by using the 
incidence matrix. We count the number of 1's in this matrix in 
two different ways, first by rows, then by columns. The former 
yields the value 2R since there are R_ rows, and each one has 
two 1's. The latter yields } d(v) since the sum of each 

Vv 


column is obviously the degree of the point corresponding to the 
column. 


Corollary. In any graph, the number of points of odd degree is 
even. 


These results also hold for the total degrees in a digraph. 
In addition, we have the following: 


Theorem 2.2: In any digraph. 


} id(v) = } od(v) (3) 


v v 
where the sums are taken over all points in the digraph. 


. Proof. Using the adjacency matrix provides an easy proof. We 
count the number of 1's in two ways. Row-wise, we obtain the 
right side of (3), and column-wise we obtain the left side. 


An important problem in graph theory is to determine when 
a sequence of integers is "graphical", i.e., when there exists a 
graph having points whose degrees are the integers of the given 
sequence. It is not sufficient that the integers sum to an even 
number. For example, there do not exist graphs having points 
with degrees. -1,2,3, or degrees 1,1,1,3,4. Hakimi (1962) gives 
necessary and sufficient conditions for a graphical sequence, 
while Berge (1962, p. 86) gives a corresponding result for 
digraphs. 


2.2 Degree Distributions of Graphs. The degree distrtbutton of 
a graph is a set of numbers P5> i = 0,1,2,---,N-l1 where N is 
the order of the graph, and P; is the proportion of vertices 


having degree i. Clearly, the P,'s form a probability distribution. 
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It will be convenient to illustrate degree distributions by 
means of "spaced-out" histograms such as in Figure 4, which 
illustrates the degree distribution for the graph of Figure l. 


O1T272 F456 o727T3 4S 
Fig. 4 Fig. 5 


It is important to note that two graphs can be very 
different and have the same degree distribution. An example 
is shown in Figure 5. 


Frank (1978), using induced subgraph sampling, gave unbiased 
estimates for the degree distribution, paying special attention 
to Po: In induced subgraph sampling, n points are chosen at 


random and one observes all of the adjacencies among them. 


The complement of a graph G, denoted G, is a graph 
whose points are the points of G, but in which two points are 
adjacent if they are not adjacent in G, and not adjacent if 
they are adjacent in G. In other words, to complement a graph 
we remove all lines and insert lines where there were none. The 
following results are immediate: 


Theorem 2.3. If Pe 0 ne eerade Py> Peetyly od. eR 
are the degree distributions of G and G respectively, then 
eh 

In other words, the degree distributions of complements are 


reflections of each other about a central vertical axis. Figure 


6 shows the complement of the graph of Figure 1 and its degree 
distribution. 


a 


ol 


01234 


Fig. 6 
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We may ask if there are graphs for which Py = Py> 


i = 0,1,---,N-1, i.e., graphs which have symmetric degree dis- 
tributions. It is well known that there are graphs which are 
self-complementary, i.e., they are the same as their complements. 
Three of these are shown in Figure 7. Obviously, they have 
symmetric degree distributions. Do there exist graphs with 
symmetric degree distributions which are not self-complementary? 
Yes! An example is shown in Figure 8. 


ie a Re Giecas Gels 
-) OTT yy 


Fig ./ Fig. 8 


Theorem 2.4. No graph has a nondegenerate uniform degree 
distribution. 


In fact, any graph having more than one point must have at 
least two vertices having the same degree. It is, however, 
possible to have a uniform distribution over part of the range of 
the degrees, such as, for example, in Figure 5. 


We define the mean degree Ug of a graph as the mean of its 
degree distribution, i.e., 
N-1 
‘ise eS OPE 
d i=0 i 
Since this is ) d(v)/N, we have from Theorem 2.1, that 
Vv 


Ug = 2R/N. Thus, all graphs of the same order and size have the 


same mean degree. Of course, there are many graphs of different 
order and size with the same mean degree, e.g., the two shown in 
Figure 9. From Theorem 2.3, the mean degree of the complement 


or Gy Wa? is” N-Er, - 


The variance of the degree distribution is defined by 
petite ti Be: 
Oe. i 2 Peart Maks 
3 1 d 
i=0 


~ 
~ 


~ 
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It is easy to get examples of graphs with equal means and 
different variances. See Figure 10. It is also easy to prove 
that o = a >» Where a is the variance of the degree dis- 
tribution of the complement. The variance is important in the 
comparison of several estimators of graph size (Capobianco and 
Frank, 1979). 


Cook (1979) has proved several theorems dealing with 
variances of degrees and sums of squares of degrees. In order to 
understand them, we require a few definitions from graph theory. 


A path in a graph is a sequence of distinct points VyrVor eV, 


in which V5 and v, are adjacent for i= 1,2,--:,k-l. If 


i+1 
vy and v, are the same, then the path is called a cycle. A 


graph without cycles is said to be acyclic. 


Theorem 2.5 (Cook). For any acyclic graph. 
¥ [a(v)]* < wen 
Vv 


with equality if and only if the graph is the star shown in Figure 
Na 


-) N-5 move points adjacent 


TO 


Fig. 11 
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A graph is planar if it can be embedded in the plane. 
Loosely speaking, this means that it can be drawn in the plane 
without any lines intersecting except at the vertices. 


Theorem 2.6 (Cook). For a planar graph, 


o,< V2N + jee : 
V2N 


Since the complement of any planar graph of order at least 9 
is nonplanar (Battle et al., 1962) this inequality holds for 
infinitely many non-planar graphs as well. 

The minimum degree in a graph is denoted 6, and the maxi- 
mum, A. Hence the range of the degree distribution is A-6. 
The minimum degree is important in problems dealing with the 
connectivity of a graph. A graph is connected if there is a 
path between any two points. The minimum number of points which 
must be removed (along with their incident lines of course) in 
order to disconnect the graph, or reduce it to a single point, is 
called the connecttvity of the graph and denoted Kk. A well- 
known theorem of graph theory states that kK < 6. Capobianco 
(1972) used this to estimate «k. ' 


Much work has been done on degree distributions of random 
graphs. See Karénski (1978) and Robinson and Schwenk (1975). 


2.38 Degree Distrtbuttons of Digraphs. In the case of digraphs, 
the degree distribution is bivariate because we have indegrees 
and outdegrees. We will picture these distributions as shown 

in Figure 12, which is the degree distribution of the digraph of 
Figure 3. For these distributions we use Wig? Uoa? ops o, 
and o to denote the means, variances, and correlation coeffi- 
cient of in and out degrees, respectively. It follows from (3) 
that for any digraph Upgeslare R/N. 


The investigation of other properties of these distributions 
is a wide open area. How does one interpret po for instance? 
What is the relationship between the two variances? 


3. DISTANCE DISTRIBUTIONS 


3.1 Introduction. The length of a path in a graph is the number 
of edges in the path. The dtstance between two points, u, v, of 
a graph denoted by d(u,v), is the length of a shortest path 
between u and v._ The diameter of a graph is the maximum dis- 
tance between two of its points. 


-CA (ee) 
406 M. CAPOBIAN! 


ji ca am 
6 ab 
S| Sed Neale 


Fies-i2 


The distance distribution of a graph G is a set of numbers 
q., 16=20,1,235°%5N-Leagwhere do is the proportion of pairs of 
i 


points G which have no path between them (distance infinity), 
and qd: i = 1,2,°°:,N-1, is the proportion of pairs of points 


(u,v)*, of AG ‘suchsthath d(u,v)e=e15 ii+e.> 
number of pairs of points with no path between them 
() 
wb 


_ number of points (u,v) such that d(u,v) =i 
oo) 
Eorgeig— es lw2.co ss N= 
Again, we will display distance distributions by means of 


spaced-out histograms. Figure 13 gives the distribution for the 
graph of Figure 1. 


SS MILT 


The following result is immediate. 


Theorem 3.1: (dt) a 0 iff G is connected. 
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(41) q, = 2R/(N(N-1)) =} d(v)/(N(N-1)). 
Vv 


(GREEDY ge qo # 0, the diameter of G is infinite. If do = Of 
the diameter of G is the largest i for which q, # 0. 
i 
The following result is easy to prove by induction on N. 


It was found useful in testing the hypothesis that a graph is 
connected (Capobianco, 1980b). 


Theorem 3.2. If G is connected, then 


It should be noted that two graphs can be different but have 
the same distance distribution. Figure 14 shows an example. 


For connected graphs, we define the mean distance Up and 
variance of the distance distribution o as expected, namely, 
= < / 2: . ay ’ Bee: 


i=0 i=0 


This mean distance is a measure of how "compact" a graph 
is. A graph is called complete if every pair of vertices is 
adjacent. Complete graphs are the only ones with Up = 1. The 


opposite extreme is a graph which is a path. In this case, 
Up = (N+1)/3. It is also straightforward to compute 


on = (N+1)(N-2)/18, for a path. Is this the upper bound on 


variance? 


Three points of a graph are said to be collinear if they 
can be labeled u, v, w in such a way that d(u,v) + d(v,w) 
= d(u,w). We define the collitneartty of a graph as the propor- 
tion of collinear triples of points in the graph, i.e., 


numbers of collinear triples 


CL = 
N 
(,) 


where CL denotes collinearlity (Capobianco, 1980a). The following 
result relates collinearity with mean distance. 
Theorem 3.3. For any connected graph G of order at least three, 


CL = 3(u,-1)/ (N-2). 
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Thus, collinearity is also related to compactness. Equality 
holds above for graphs which are geodetic. This means that they 
have exactly one shortest path (geodesic) between any two points. 
We define the geodeticity, g, of a non-complete graph by 


& 3(u,-1)/ (N-2)CL 


and take g=1 for a complete graph. This is a measure of how 


geodetic a graph is which can be estimated by observing only 
distances! 


3.2 Untform Distance Distributtons. There are graphs with dis- 
tance distributions which are uniform over part of the range. 
Figure 15 shows two examples. 


Figendo 


In fact, we have the following: 


Theorem 3.4. Any cycle of odd order has distance distribution 
2 N-1 
He eee 
da 
0 ye oe ed 
2 
Theorem 3.5. Any cycle of even order has distance distribution 
fae oe 
Ne Trey. 2s 7) 1 
= il kN 
Ad N-1 ih 
N+2 N+4 
0 i= 5 oe *+>s N-1 
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A graph is called bipartite if the set of vertices can be 
partitioned into two sets of nonadjacent points. If all adja- 
cencies possible between the two sets of points are present, then 
the graph is called a complete bipartite graph. We can show 
the following: 


Theorem 3.6. Complete bipartite graphs in which the two sets of 
vertices have cardinalities 


K+1 ae 
2 and 2 one ie AiG 


have distance distribution ee gee he, 


2 
The graph in Figure 16 has distance distribution 
“wong tied. Eee 1/3, so that Theorems 3.4 and 3.6 do not yield all 


graphs with "uniform" distance distributions. In fact, the graph 
in Figure 17 is not complete bipartite but has distance distribu- 
tion PE ead Ole by 


Fig. 16 Fig. 17 


Because the existence of a path of length k implies the 
existence of one of lengths k-l, k-2, and so on to one of 
length 1, one may get ‘the impression that a distance distribu- 
tion must be "non-increasing." This is false. Figure 18 gives 
examples for which the mode is central, and my colleague, F. 
Buckley, found the graph of Figure 19 for which the mode is the 
diameter!! 


Investigating the properties of these distributions appears 
to be a fruitful field of research. What relations can we derive 


between Up and on? Between of and the diameter? How can we 


characterize graphs with uniform distance distributions? 


3.3 Miscellaneous Questions. Work has been done in the area of 
distance distributions of random graphs. See Karonski (1978), 
Meir and Moon (1978,1975), Moon (1971). 
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Fig. 18 Fig.419 


For digraphs, distances must be directed. If two points 
do not have a dipath (directed path) between them, they are at 
infinite distance. Figure 20 shows an example. No work at all, 
to our knowledge, has been done on distance distributions of 
digraphs. We suspect this area is rich with strange results. 


4. CYCLE DISTRIBUTIONS 


4.1 Introductton. The length of a cycle is the number of edges, 
or points, in it. A cycle of length k is called a k-cycle. 


Defining the cycle distribution of a graph presents rather serious 


problems. It is easy enough to tally up the frequencies of 3- 
cycles, 4-cycles, etc., but deciding what relative frequency, 
i.e., probability, should mean causes severe difficulties. One 


inte 
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might say it is simply the number of k-cycles divided by the 
total number of cycles. However, doesn't it seem more reasonable 


that the probability of a k-cycle, Ty.» say, should be 


the number of k-cycles 


These r's do not, of course, sum to unity! Perhaps we should 


mA. the number of k-cycles 


k 

N N 

meee 
Here, the denominator is the total number of sets of points of 
cardinaltiy, at least three. But this is surely wrong. We 
should use as numerator the number of these sets which contain a 
k-cycle. But then what do we do when one of them contains more 


than one k-cycle, as, for example, in the graph of Figure 21? 


Fig. 21 


5. OTHER DISTRIBUTIONS 


Holland and Leinhardt, who are among the originators of 
statistical inference in graphs, have worked with the distribution 
of the triads in digraphs (1970, 1975). These are the 16 possible 
subdigraphs that can be formed using three points. See also 
Frank (1978a,b) and Wasserman (1977) 


Capobianco (1978, 1979) used distributions based on the 
interaction of pairs of points in a digraph. Thinking of a di- 
graph as a model for a sociometric choice pattern, two cases were 
considered, fixed choice and variable choice. In the former, each 
person (point) was restricted to naming exactly three friends. 

In the latter, all friends in the group were to be named. Hence, 
in the fixed choice scheme, each point has outdegree three, while 
in the variable case a point can have any outdegree from zero to 

N= 
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In the fixed choice case, examination of how pairs of 
points interact yields only ten different underlying configura- 
tions. These are shown in Figure 22. The encircled points in 
the figure are the points of the pair in question. The distribu- 
tion of the configurations under randomness was computed so that 
the distribution of a sample of pairs could be compared to 
that expected under randomness. This technique seems to be 
effective for obtaining indications of certain characteristics 
of the population, clustering, symmetry, reciprocal choices, etc. 


In the variable choice situation, a trivariate distribution 
was derived for a pair of points. The random variables were 
j, the number of lines between the points of the pair, K, the 
number of points chosen by both points of the pair, and M, the 
total number of points which were chosen by any one point of the 
pair. 


This distribution is given by the probability function 


(N-2)! m m2K 2(N-2-K)-m 
Kl (N-2-K-m)t a) 2 nF ) 


£(}) 
h = = = . = 
where f(0) Pog? £(1) -=~2p, ue) Pye Py = P+ Pay? 
P, =P +p,,- The correlation matrix is 
0 00 
1 0 0 is 
2 
2P 1 
pee 
) = 
v P, (2-p,) (PotP) 


i a 
Plotting K vs. m using three different symbols for the 


values of j is a technique which seems to provide information 
about the population digraph. 
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COGRADUATION BETWEEN STATISTICAL DISTRIBUTIONS 
AND ITS APPLICATIONS — A GENERAL REVIEW 
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1. THE NOTION GF COGRADUATION AND ITS IMPLICATIONS 
IN THE THOUCHT OF CORRADO GINI 


The notion of cograduation is very elementary and it seems 
almost impossible that on it a whole body of doctrine may be 
built, which, owing to its organic unity, to the extent of 
its subject and its implications was and still is one of the 
fundamental contributions to that trend of thought called 
"Italian statistical school". 


If we consider a double statistical distribution we 
shall call correspondent the quantities actually associated in 
the same classification unit, whereas we shall define cograduated 
those occupying the sane position in the relative classifications. 
With regard to the notion of cograduation several statistical 
indexes have been put forward to measure how much, in the mean, 
the connotation of correspondence between quantities differed 
from that of cograduation. 


Every proposed index corresponded obviously to specific 
purposes of investigation_determined each time by the needs 
of the concrete research. It is only in 1914, however, that 
Corrado Gini clearly focused the possibility for the above 
mentioned notions to be considered as the basis of a complete 
and unitary theoric tissue which will be called "concordance 
theory". The work written in 1914 draws only a few premises based 
on the notion and the measure of the dissimilarity. In the 
following works, which cover a period of time roughly corresponding 
to the First World War, Gini was developing all the implications 
following from the cited work, in order to define the concordance 
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Pescry. The reading of these works is not easy and it becomes 
particularly difficult when there is the problem of seizing 

the leading thread which link them up and when it is necessary 
to explain the essential passages of this way. 


On this observation it is possible to base the necessity, 
that Salvemini felt in the forties’, of resuming this subject, in 
order to connect or better to clear up the connection among the 
cells of a mosaic. It is the author of this work who, only two 
years ago, deemed it opportune to study this subject once again, 
to give a contribution for the explanation of the conceptual 
premises from which the whole theory by Gini results.” If all 
that is opinable from a theoretical point of view, it is also 
true that both Salvemini and the author of this work have 
considered only one of the concrete meanings that the whole theory 
by Gini may take. 


Now we are going to examine in short the foundations of this 
theory. We shall start by pointing out the distinction among 
characters, attributes that cannot be ordered and classifications. 
The first ones refer to quantitative aspects of the real and may 
be divided in enumerable and measurable. The second ones refer 
to qualitative connotations and each of them differs in the same 
way from all the others; therefore it is not possible to fix 
a logical succession. The classifications, which refer only 
to the places concerning the entities at issue, can relate only 
to characters or to attributes that can not be ordered. For these 
it is also possible to develop a theory equivalent to that out- 
lined for the characters. 


If the notion of cograduation is considered in a strict 
sense it regards only the comparisons between classifications. 
But it is just through the concept of dissimilarity by Gini 
that this notion can play an essential role also as to the 
comparison between characters and, owing to a few propositions 
that we are not going to consider now, as to the comparison 
among attributes that cannot be ordered. ® 


As a matter of fact if we dwell for a while upon the aspects 
which are strictly connected with the notion of cograduated 
entities, or if you like, of cograduation, meaning by that the 
proceeding to cograduate two successions having the same number 
of terms, it is possible to come spountaneously and rapidly to the 
determination of the cograduation indexes.’ But these imply 
other concepts which involve the whole theory by Gini. Con- 
sequently it is necessary to mention them as well. These con- 
cepts will permit then to conceive a cograduation index as an 
index of concordance between classifications. On the other hand 


the concept of concordance represents another main point of this 
paper. 
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To introduce this notion it is necessary to dwell upon the 
characters owing to their chief importance and because what 
will be said about the theory referring to the attributes that 
cannot be ordered and the classifications, might always result 
as a particular case of what we have already argued about the 
characters. 


Here the distinction between correspondent entities and 
cograduated entities comes back as a main point of the paper. 
Because of our simple treatment of this subject we shall always 
consider them as pertaining enumerable characters and therefore 
we shall speak again of quantities. 


As a matter of fact by the word "discordance" Gini means 
also the difference between correspondent quantities and we can 
mark them with 14 the mean arithmetic value of discordances, that 
is to say of the differences between correspondent quantities 
considered in absolute value and, likewise, with 2M the mean 
arithmetic value of the squares of the discordance. When instead 
of the discordance we consider the differences in absolute value 
between cograduated quantities or the squares of the differences, 
it is possible to come, by calculating the mean, respectively 
to the simple dissimilarity index and to the square of the quad- 
ratic index. 


It is now interesting to consider the meaning that the value 
0 of the dissimilarity index (either simple or quadratic) may 
assume, not only to explain the meaning implied in this particular 
value of the index, but also because another fundamental concept 
for the concordance theory arises from it. 


Then we start observing that to the value 0 of the indexes 
at issue corresponds the notion of similar simple statistical 
distributions (i.e. defined by equal quantities to which equal 
relative frequencies correspond in an orderly way). If this 
concept is supposed sound for each pair of simple subordinate 
statistical distributions forming a double statistical distri- 
bution and also for the two simple statistical distributions 
forming the double statistical one, it is possible to come to 
the definition of the association frequencies in the hypothesis 
of independence. We denote with IM) the mean arithmetic value 
of differences in absolute value between quantities in the 
hypothesis of independence, and with 2Mo the mean arithmetic 
value of the squares of these quantities in the same hypothesis. 


It is now necessary to fix a criterion to verify if 
between the simple statistical distributions forming the double 
statistical distribution there exist or not any concordance, 
reserving to ourselves the right of explaining later on the 
sense of the expression "non-concordance"’. In that respect 
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the problem is very simple, if we refer to discordance squares, 
whereas it turns out rather complex if we consider the absolute 
values.2 Therefore we are simply going to develop quadratic 
discordances. We shall say together with Gini that there exist 
a concordance when the quantity 2M results inferior to the 
quantity 2Mo- If these quantities have equal value we shall 

say again, agreeing with Gini, that between the simple statistical 
distributions forming the double statistical distribution there 
is indifference, as this concept has got a meaning which is less 
restrictive than that of independence. The equation at issue in 
fact does not necessarily imply that these association frequencies 
appearing in 2M are the same ones characterizing the definition 
of 2Mo. In this paragraph we do not consider the hypothesis 
where the quantity 2M turns out superior to the quantity 2Mo- 

As a matter of fact in this case we usually speak of discordance, 
but this is, in our opinion, a concept that cannot be specified 
without introducing the notion of contrary distribution. Here 
the paper is too premature, as we are going to deal with this 
matter in Section 2. We remember that the difference 2M aa 
characterizes the nature of the criterion of a-concordance by 
Gini. It is always sufficient to refer to it when squares 

of differences are used. 


To define a concordance index (it does not matter if con- 
sidered in a strict sense or in a broad sense) it is necessary 
to relate the above mentioned difference to the maximum value it 
can assume, since only in this case the index will vary ina 
definite interval. Here another problem arises, as regards the 
way of interpreting the maximum of concordance. 


For simplicity's sake we shall face it referring to the 
notion of concordance in a strict sense. Given the simple sta- 
tistical distributions forming the double one we are sure to 
have a maximum of concordance (in a strict sense), when the 
mean value 4M coincides with the square of the quadratic index 
of dissimilarity between those distributions. It is clear, 
however, that the concordance may be perfect only when these 
latter distributions turn out similar (or if you like equal, 
since in this context the notion is not resv:rictive). In the 
concrete meanings two cases may occur: in the former case 
the simple statistical distributions forming the double one 
are logically antecedent to the association of the quantities 
at issue in the classification units; in the latter they are 
consequent.11 It is possible to understand then that in the 
former case the maximum of concordance must be the one we have 
already pointed out and therefore we shall speak of relative 
maximum of concordance. In the latter case the simple statistical 
distributions are not, conceptually, conditions regarding the 
possible ways according to which the associations may take place. 
Therefore they can be changed so as to coincide, i.e. so that the 
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quadratic index of dissimilarity may become null; we shall 
speak then of absolute maximum of concordance (in a strict 
sense). 


With reference to the relative maximum of winanisderiteere lis 
is possible to outline the definition of the quadratic indexes of 
homophily, whereas as regards the absolute one it is possible 
to outline the definition of the correlation indexes. 


As far as the attributes that cannot be ordered are con- 
cerned we shall define indexes of attraction the concordance 
indexes based on the notion of relative maximum and resemblance 
indexes those referring to the notion of absolute maximum. 

Even if it is not necessary we like to observe that we refer to 
quadratic indexes in this context. 


Gini's theory does not consider the distinction between 
relative maximum and a>solute maximum with regard to quadratic 
indexes of cograduation, that is to say, as we have already 
mentioned, with regard to indexes pertaining to the concordance 
between classifications. As a matter of fact if to each entity 
given at the beginning, we assign a different place in the 
classification, it can be easily deducted that the relative 
maximum and the absolure maximum of concordance coincide and 
therefore Gini speaks only of the quadratic index of cograduation. 


Recently, however, one of our collaborators felt the 
necessity, justified by a substantial research, of considering 
sets of quantities with an equal classification place and con- 
sequently of introducing the frequency of each set. This has 
led him, also in terms of quadratic cograduation indexes, to 
take back Gini's theory as to the distinction between relative 
maximum and absolute maximum of concordance, by the introduction 
of a quadratic index of cograduation that makes use of the 
relative maximum of concordance.l5 With this contribution the 
picture that Gini had cGrawn in the far 1916 referring to the 
Concordance Indexes, is completed. 


2. QUADRATIC INDEXES OF CONCORDANCE 


In the preceding pages we have observed that the 
quantitative foundation of the concordance measure lies in 
the difference 2M) - 2M. We have also observed that it is 
possible to attain a concordance index by relating the above 
mentioned quantity to the maximum value that it can assume, 
which, in the hypothesis of relative maximum turns out 2Mo - 2p2, 
where 2D2 is the square of the quadratic dissimilarity index, 
that notoriously measures the minimum possible distance between 
two double statistical distributions forming the double statistical 
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distribution. The quotient between the first and the second 
difference we have mentioned characterizes the quadratic homophily 
index. It is obvious that this construction has got a sense if 
we speak of concordance in the strict meaning, that is supposing 
such result 2Mo - 2M 30. The problem that consequently arises, 
following the o-criterion of concordance ,» which consists in 
analysing and interpreting the sign of the difference 2Mg - 2M, 
regards the arguments which are to be developed when this sign 

is negative. We can state without any preplexity that in this 
case between the simple statistical distributions forming the 
double statistical distribution there is discordance. Now 

it must be carefully analysed on what conceptual foundation the 
maximum value, that this negative difference may assume, must be 
determined. The purpose of this is to construct a quadratic 
index of discordance. 


This problem is actually examined and solved by Gini by 
using the notion of cograduation.1/ If we indicate with 2M 
the mean arithmetic value of the squares of differences between 
contragraduated quantities, this quantity is known to result a 
maximum and therefore the structure of the quadratic index of dis- 
cordance cannot be formally equal to that of the quadratic index 
of concordance (in a strict sense). Gini then follows a proceeding 
whose spontaneity we are not able to understand. But the 
conceptual question is another one; it consists in pointing out 
that by operating in this way it is not possible to use the 
notion of dissimilarity index, which is certainly one of the main 
points of the whole theory. On the other hand the notion of 
contragraduation is in itself a formal variant of that of 
cograduation and hence arises clearly the need of introducing 
a concept that may give a theoretical foundation, consistent 
with the given premises, to the discordance indexes on which the 
quadratic index has still got to operate. On the other 
hand it was Gini himself!® who felt clearly the logical necessity 
of introducing a notion that we have only used to overcome the 
ets brilliant aspects of the construction we have already referred 
to"”. It is the notion of contrary. If we indicate with a, the 
generic modality of the simple statistical distribution A, com- 
ponent of the double statistical distribution (A,B), we shall 
call with a the stmple statistical distribution contrary of the 
distribution A, whose modalities result defined by the following 
relations: a, + ee Ne ea Ky> being m the number of the modalities 
of the distribution A and Ky constant of addition; once this 
constant has been defined the distribution a is marked. In fact 
a) frequencies it is almost unnecessary to remark that 

O 

i S Mm-itl’ 
assigning the mean arithmetic value of the distribution Qa, as 


it is clear that My ~ Ma = Kae As a matter of fact here one 


In the concrete the problem may be solved by 
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should distinguish between complementary characters and non- 
complementary characters, since for the first ones there is no 
problem of choice as to the definition of the contrary distri- 
bution. In any case when from the modalities of the character 
we pass on_to the deviates (from the arithmetic means) it 
results: a, Cie zad = 0 and therefore for the deviates there 


is not the problem of any subjective choice in order to define 
the contrary distribution. 


We shall call, agreeing with Gini, inverse quadratic index 
of dissimilarity29 the dissimilarity index built between a 
simple statistical distribution and the contrary of the other 
simple statistical distribution forming the double statistical 
distribution. 


If we make use of the notion of inverse quadratic index of 
dissimilarity the quadratic index of heterophily results to be 
logically constructed on the same way as the quadratic index of 
homophily. 


From the conceptual point of view, however, the question 
is more subtle and it regards the autonomy of the concept 
of discordance. Indeed we have observed that the notion of 
contragraduation is not characterized in itself, in our 
opinion, but only as a formal variant of that of cograduation. 
It seems to us logically meaningful to underline that this 
critical interpretation may be also suitable to the notion of 
discordance, in the sense that it appears as a formal variant of 
the notion of concordance and in this case is involved the con-~ 
cordance between a simple statistical distribution and the simple 
statistical distribution contrary to the other, forming the 
double statistical distribution. 


It is almost superflous to remember that the quadratic 
index of heterophily results univocally determined by choosing 
indifferently the contrary distribution of one of the two 
simple statistical distributions forming the double one. 

In the substantial sciéntific research the distinction that 
Gini, however, clears punctually as to the comparison between 
intensities, deviates and standardized variates22, may find 

a concrete legitimation; even if it is true that Fortunati 
maintains clearly how the use of the comparison between deviates 
or between standardized variates must actually imply the logical 
justification of the comparison between intensities. On the 
other hand the three meanings of the mentioned quantities result 
connected, in geometric terms, by elementary linear relations. 
Welcoming this distinction commonly accepted in the applications, 
we want to remember thatthe quadratic indexes of heterophily and 
homophily keep equal formal structure either in the case of the 
concordance measure between deviates or between standardized 
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variates. Gini comments this result as a negative aspect of 
the quadratic indexes at issue. In our opinion, however, the 
problem is to distinguish, on one hand, between the formal 
structure of an index and the possibility of interpreting the 
index itself in different ways, on the other to think again 
carefully over the observation made by Gini, in the light of 
the argument developed by Fortunati. 


We are going to deal now with the quadratic indexes of 
concordance which refer to the notion of absolute maximum of 
concordance. 


If we consider the concordance in the broad sense, the 
above mentioned indexes, that we call, agreeing with Gini, of 
correlation, must be able to oscillate in the closed interval 
[-1,+ 1] assuming the value -1 in the hypothesis of absolute maxi- 
mum discordance, and the value +1 in that of absolute maximum of 
concordance. 


Set in the above mentioned terms the prcblem concerning ; 
the construction of a quadratic correlation index, it is clear 
that the quadratic index of dissimilarity between the simple 
statistical distribution A and B (or that between the respective 
deviates or between the respective standardized variates) is 
to result null and the quadratic dissimilarity indexes built 
with regard to the distributions a and B ard to the distributions 
A and 8 are to result simultaneously null tvo. In order that 
this might happen it is necessary for the four above mentioned 
distributions to turn out symmetrical and coincident. Gini 
expresses himself like that. It is not a matter of verifying 
if these distributions are symmetrical and coincident or not. 

It is logically possible to define a double statistical distri- 
bution whose simple statistical distributions are symmetrical 

and coincident, and coincident with the respective contrary 
distribution as well. This aim can be easily achieved by defining 
a simple statistical distribution as arithmetic mean of the 
distributions A, B, a and B.26 For the construction of the 
denominator of the quadratic indexes of correlation we make 

use of a distribution which is symmetrical and coincident 

with the respective contrary distribution; from this fact it 
follows that both the direct quadratic indexes and the inverse 
quadratic indexes of dissimilarity result simultaneously 

null. The use of the plural in the last notes may seem wrong, as 
it is obvious that if one pair of indexes is obtained with ref- 
erence to the relative maximum of concordance, with regard to 

the absolute one only one index results to be built. This 
plural, however, concerns another aspect of the problem that 

we are going to clear up immediately. It is in fact easy to 
demonstrate that three quadratic indexes of correlation may be 
obtained; they refer respectively to the measure of concordance 
between intensities, deviates and standardi: ed variates.2/ 
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Among the pieces of this great mosaic we find again the 
coefficient of correlation by Bravais Peason in the quadratic 
index of correlation between standardized variates. 


We would like to stop our exposition with the last sentence, 
but this would mean to believe that there are conclusive moments 
in scientific research. 


As a matter of fact if we consider the structure of the 
quadratic index of correlation between intensities we realize2 
that this is not equal to +l in the hypothesis of absolute maxi- 
mum of concordance, that is to say when the distribution A 
coincides with the distribution B and likewise the distribution 
a coincides with the distribution 8.29 


If we suppose My = My and also Mp = Mp this value can not 


be found. But if these conditions are imposed, the structure of 
the index given by Gini which contains the four mean arithmetic 
values we have just remembered, is no longer justified. It is 
not clear how this index may be changed’, or better, how the 
measure of the concordance among intensities may be formulated, 
so that it is possible to fix an index that avoids the above 
mentioned troubles. 


On the other hand when Poctinsete suggests to define the 
constant of addition as twice the arithmetic mean of the simple 
statistical distribution assigned (and that is to avoid any 
arbitrary choice in the definition of the contrary distribution), 
he points out, implicitly, a way of solving the question in hand. 
It is obviously taken for granted that the formal structure of the 
index will appear with a less general meaning than that given by 
Gini. 


It is also true that Gini himself distinguishes between 
characters which admit a complementary character and characters 1 
for which the complementary one is not determined or determinable. 


In respect to a character admitting the complementary 
one Gini argues about the spontaneity of assuming this latter as 
the contrary of that given at the beginning. In this case, 
‘however, we are not able to accept the methodical indication given 


by Fortunati. 


NOTES 


1. C. Spearman, "Footrule" for measuring correlatton, in "The 
British Journal of Psychology", II Vol., 1906. This index 
has been found again by K. Pearson, in a context which shows 
a different kind of substantial problems in the work On 
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further methods for determining correlation, in "Draper's 

Co Research Memoires Biometric Series" II, 1907. A different 
cograduation index is that suggested by Kendall in the work: 
A New Measure of Rank Correlation, in "Biometrika", XXX Vol. 
1938. These indexes, but not the simple cograduation index 
by Gini which we are going to mention in the note 7), result 
to be particular cases of a cograduation index introduced 
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"eluster analysis", in "Metron", XXX Vol., no. I-4, 1972. 
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C. Gini, Dt una misura delle relazioni tra le graduatorie 
dt due caratteri, appendix of the monography: Le eleziont 
generali del 1913 nel Comune dt Roma, Cecchini, Roma, 1914. 
I like to thank Professor Tommaso Salvemini as he has got 
a copy of this appendix for me. C. Gini, Indici di concor- 
danza, cit. The index at issue is a simple cograduation 
index and consequently we are not going to deal expressly 
with in this work; we only remember that it is constructed 
making use of the criterion of concordance 6 (without 
losing anything from a general point of view). 


It is well known that on the theory of dissimilarity 

Gini builds also the connection theory, giving thus the 

word dependence a clear and original meaning. The simple 
index of connection appears for the first time in the work: 
Di una mtsura della dissomiglianza tra due gruppt di 

quantita e delle sue applicaztoni allo studio delle 

relaztont stattstiche, cit., the quadratic one in the 

work Nuovt contrtbuti alla teoria delle relaztont statisttiche. 
From the conceptual point of view to those that led Gini 

to the determination of the homophily indexes. In 1957 
Castellano will call these indexes "total indexes of 
connection" and he will introduce the "global" indexes 
founding them on a principle equivalent to the one which 
characterizes the correlation theory according to Gini 

See V. Castellano, Contributi alla teoria della correlaztone 
e della connesstone tra due vartabilt, in "Metron", XVIII 
Vol. no. 3-4, 1957. See also: A. Gili, Aspettt fondamentalt 
della teorta della connesstone e concordanza nell 'impostaztone 
dt Corado Gint, publication of the Instituto de Scnola di 
Statistica dell' Universita di Bologna, Cooperative libraria 
Universitaria Editoriale, Bologna, 1970. 


C. Gini, Sul ertterio dt concordanza tra due carattert, 
eRit 


The phrase "concordance in a strict sense" refers to the 
hypothesis “Mg - 24 > 0. The phrase “concordance in a 
broad sense" or simple "concordance" means concordance in 

a strict sense or discordance; it is therefore necessary to 
specify this latter word and to this end we put off the 
explanation to the 2.2 paragraph. 


The suggested distinction is somewhat schematic. For more 
details see C. Gini Indici di omofilia e dit rassomiglianza 
e loro relaztont col coeffictente di correlaztone e con 
gli indict di attrazione, cit.; A. Gili, Indiet quadratict 
di concordanza, cit. 
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In a strict sense. 


In the memoir Indici di omofilia e di rassomigltanza e 

loro velaztont col coefficiente di correlazione e eon gli 
indict dt attrazione, cit., Gini comes to the construction 
of a quadratic resemblance index, that turms out to be the 
obvious extent, as to the attributes that cannot be ordered, 
of the quadratic correlation index between variations (refer- 
ed to characters). In the memoir there is also the notion 

of qualities, deviations and mutations which refer to the 
attributes that cannot be ordered like those respectively 

of tntenstties, deviations and standardized vartates 
concerning the characters. Later on in the memoir Nuovt 
econtrtbutt alla teoria delle relaztoni statisttche, cit., 
Gini introduces the quadratic index of attraction that has 
got as its logic equivalent the quadratic index of homophily. 
From Gini's text we can deduce that this index keeps 

equal structure in respect either with the measure of concor- 
dance between qualities, or between deviations or mutations, 
just as it happens for the quadratic indexes of homophily, 

as it is mentioned in the 2.3 paragraph. In analogy with 

the distinction which marks the quadratic indexes of corre- 
lation which we are going to deal with in the 2.4 paragraph 
Gini defined in the memoir Indict dt coneordanza, cit, the 
quadratic index of resemblance aong qualities, the one among 
deviations and that among mutations. 


C. Gini, Indici dt coneordanza, cit., the quadratic index of 
cograduation by Gini coincides with the quadratic index of 
cograduation introduced by Spearman in the work cited in the 
note (1). The demonstration is due to Gini himself and 
appears in the work cited in that note. 


G. Bettuzzi, Sulle relaztoni fra forme dt insediamento 

a movimento naturale della popolazione emiltana nel pertodo 
1951-71, in course of print in the volume "Studi in onore di 
Paolo Fortunati", Cooperative Libraria Universitaria Edito- 
riale, Bologna. Even if in the present work we do not 

deal with simple indexes of concordance, we must remember 
that in 1939 Salvemini introducea a simple index of cogradu- 
ation with regard to the notion of relative maximum of 
concordance. See T. Salvemini, L'indice di cograduzione del 
Gint nel caso di serie statistiche con ripetizione, in 
"Metron", XIII Vol. No. 4, 1939. 


C. Gini, Sul eriterto di concordanza fra due carattert, 
eLte 
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C. Gini, Nuovt contrtbutt alla teorta delle relaztont sta- 
tistiche, cit. 


C. Gini, Muovt contributt alla teorta delle radazgtoni sta- 
£7stione, cit? 


A. Gili, Aleune esplicttaztont sut fondamentt della teoria 
della concordanza, cit. Id. Id., Indiet quadratict di 
coneordanza, cit. 


C. Gini, Nuovt contrtbuti alla teorta delle relaztont sta- 
tisttche cit. 


A. Gili Indict quadratict di coneordanza, cit. 


C. Gini, Indict dt omofilta e dt rassomigltanza e loro 
relaztoni col coeffictente di correlaztone e con gli 
tndtet di attrazione, cit. 


P. Fortunati, Aleune consitderaztont sulla tmpostaztone 
gintana delle mtsure di concordanza, in "Atti della XXI 
Riunione Scientifica della Societa Italiana di Statistica", 
Bologna, 29th - 30th May 1967. 


A. Gili, Sulla proprieta metriche delle costantt stattstiche, 
ti Statistica”: Year XXXII, no.-3, 1972. 


C. Gini, Indici di omofilta e dt rassomigltanza a loro 
relaztont col coeffictente di correlaztone e con gli 
indtet di attraztone, cit.; Id. Id., Indtcet di concordanza, 
aa lf Be 


T. Salvemini, Leztoni di Statistica Metodologica, part III 
Le relaztont statistiche, cit., P. Fortunati, Alcune 
eonstderaztoni sulla impostaztone giniana delle mtsure dt 
concordanza, cit., A. Gili, Aleune esplicttaztoni sut 
fondamenti della teorta della coneordanza,cit. 


C. Gini, Indtet dt coneordanza, cit. 


A. Gili, Una postilla sull'indice quadratico dt correlazto- 
ne tra tntenstta, in "Statistica", Year XXXVIII, no. 1, 1978. 


C. Gini, Indteti di coneordanza, cit. Parallel considerations 
are worth for what concerns the absolute maximum of dis- 
cordance and the conditions which let the index assume the 


value - l. 


3 
4 


AUTHOR INDEX 


Afiti. AzmAv age ally 23 

Ahmed, A. N., 350, 359 

Ahmad, *Ms,.--1:2;,, 1:23 

Ahsanullah, M., 314, 317 

Aitchison, J., 134, 139, 363, 
366), 3645 369%% 3/25 6373, 
374 e377 ,¢ 378, 3802 383, 
384 

Aiuppa, T. A., 254, 256 

Amato, P.46133 

Amos, D. E., 214 

Anderson, T. W., 147, 154, 
225, 229, 346, 394 

Arjas, E., 284 

Arpoldne Bit C.. 275). 27 7iei20 Sis 
279, 280, 284 

Ayer, M., 27, 32 

Azzalini, A., 55 


Bahadur, oR. Rs, 167, 220,--229 

Bainweier)s « Lal, L773, 79), 
Didntee? 4ER2 86 48 285 

Balkema, A. A., 327, 333 

Bargmann, R. E., 157, 158, 
U59n16835° 2415 9242, 
245). 24768 252g 256 

Batlow wr <0 Ee. 127562844325 
ZZ eZ Ias e214, 218% 
28056 282% 2855-310,43155 
BI se 5 322 WO Zoe SOUS 
360 

Barndorff-Nielsen, O, 88, 94, 
108 

Barnete,ave .. 208, 20457205, 
ZOT<* 2098 S213 

BartlettseM.Ss, 36, 48,2158, 
220 2229 30 26256210 

Bastin Awe, Oost 7 Oj wl 72 5 
lg Sly Osc! Sete B80 52 272. 
Zit 2D ee Dae 2 8bee2 910, 
B06, 6352), G56, 337 .+o38, 
339 »< 3405 6 3415) 3425+,343, 
345, 346 

Bates AG aE. 589/11 

Battle, Js5 412 

Behood ans pureed 29 123 

Belyayev, Y. K., 13, 22 


429 


Bemis, B. M., 171, 173, 179, 
27 285 

Benettin, G., 358, 360 

Berge, C., 401, 413 

Berk, B. H., 161, 162, 167 

Berkson, J., 30, 32 

Berlin, B., 346 

Berman, S. M., 337, 346 

Berndt, G. D., 27, 32 


Bhattacharya, C. G., 112, 123, 

Bhattacharya, G. L., 171, 172, 
L735) Aa 9 

Bhattacharya, G. K., 182, 191, 
274, 285 

BLiddkar $.5%,,.525) 59% 605q625 
68, 269, 972, 290, 293, 307 

Billingsley, P., 323, 325 

Birnbaum, Z. W., 336, 346 


Bishops. D.0.).56220.9229 
Blischke, W. R., 112, 123 
Block. Ha, 272862755 27958285 
Block, (Hor Wiesel ogee oe 
ZIG W279, 2805.29 2828 
2898 2845) 2855, 291 30s 
S4Ne 34254 3465. 35055 360 
BlomyG. ko, 122 
Blum, oJieak «gel eel 9) el 2), 
15386 15450698 180.8 3526 
360 
Blumenthal, S.yalls lO, 7 os 
80s 8h. 62, 88.584, e008 
330,2833 
Boswelleam. glag C0... Ol5e7 1, 
ids 123,189 
Bouvers He, 242,247, 2525. 250 
Bowdens&D.aG.,.0125, 1245 o25 
BownangeK.2 0,0 12,022 813m. 
U3S96 230% 42382, 9250. 40 
Box, 2G). ECAP. 2458; 220542245 
225 5 226,229 50 200, e270 
Brain, CELW. walone 7.22 
Bray, TsAA.,6120,6024 
Bresenham, Jin Bene / soo 
Breimd Leva shies Cory clo lee SOF 
Brodsky, J., 346 
BEOWINE Jee Any Crs. Loa, 
Brink. H. Do, 27, 32 


139 


430 


Bryant, J. (lesecl 9s 286 
Bryson, Ges 13205, 330, 392 
Buchanan, W. B., 282, 283, 286 
Buckley, F., 409 


Capobianco, M., 397, 400, 404, 
405, 407, 411, 413 

Sassandro, M., 358, 360 

Chanda, I. C., 95, 365 43,247 

Chang; Wee C. ,-dl2,R 1285.25 

Chat£Lleldyec, 5 5/58595 va 

Chen. Hts Joseo,e Zo 

Cher a T.2Cs52 937 94 

Chernoff, H., 44, 48 
iyey2 = AMSHL 

Chiang, C. Ls, 346 

Chtéppay Mis 133, 7134, %1359 
139 

Chuneeek alae) 550, 0505 

Cintareebn. 30,8533 

Clark; WegAah 26,6 2/753 lees, 
274, 286 

GIS fomds. Pkye O40 

Cobb5s L.5 no” 

CoberilyyeWa, Ae, 25 12 

Cohen; Aes, Tee 1035. 285 
LOIS. Lo2ke L67ee 246ne 250 

Coleman, R., 38/7, 392, 393, 
394 

Consul eR. Cee 22seeee2On ello 

Gook, R-, 4045)-405,. 414 

Cooke Wee line Lov 

Corbet, fA .0S* .ao pearie 

Cottertii, De (S.. 1mm eaalayy 
A Shs) S We lesy poe yids. 

Cox, Dia Rey 20d tae ee Ole 
210, 4380, 6383 

Cozzolino, J.eM.,.28kuo5 

Crain, eee ks Oly OO tmod 

Craner, Os, 40. ) 49 

Cramer-Von Mises, 141, 155 

Csorgo,/M., . 141, 148,,246, 
147,°148,.149, 8150, 

152, 153, 154, 155 
Ggereo!,¢S.5555 


Deaeostino, R.OB.,.75 llje22 

Dahyiay R. C., 28,. 33,).80s82% 
83, 84 

Dariinewmp. A., 147, 154 


— 


AUTHOR INDEX 


David,.Hs, Avs 605-725 207, 
214, 280, 286, 336, 346 

Davidson, D., 366, 384 

Day 5) N. JB, Tay ol 23,0290; 
307 

DeHann’, *L. 5742/5 7333 

Dempster, A. P., 103, 108, 
119, 124 

Desu, M. M., 343, 346 

DeWet, 1... 53,5 

DiCastro, C., 358, 360 

Dick,. Ni P53 112;8424 

Dishan, M., 328, 334 

Downtons Fi5° 2752. 276;,°2795 
278%- 279),- 286 

Dugue, D., 147, 155 

Dunsmore, I. R., 377, 384, 
385 ' 

Durbin, J., 8,700, °225*S 55 
1475-155 

Duttayuk.. 6435 049 

Dystraj~R.+ Li4 320,-92354 325 


Ebrahimi, N., 283, 286 

Edwards,~ Cc. By, 59,07 

Edwards, W., 366, 384, 385 

Ehlers;=-Ps Fs5+ 3875 3885" 389, 
390, 392,> 394 

Ehrenbergy>Ax *S.2 Gs, 7/56 9s 
7p 

Elaudt-Johnson, R., 343, 346 

Elderton, W. P., 242, 247, 
256 

El-Neweihi, E., 283, 284, 286 

Engelman, Lo, 224. 229 

Esary, *J:*Ds ,*277,«279,2 2642 
282, 28342863" 3148=3175 
3205 325, +350%7-360 

Ewing, GiSMia27.532 


FarliesvD. UG. ,Pi70ne 180, 
290%= 307 

Feller, W.,:.332,:° 333 

Fercho,*>Ws Wi) 133P22 

Fertig, Ky. Ws, L4eee3 

Filliben, J\ Rk, 5,56,%8.?22 

Fisher, R. A., S72 72 

Folks, DSi+Ls5- 460, 0167 

Frane, Js Wi, 224, <229 

Francie, UR. S<, 4,2, soos 


ia oe 


AUTHOR INDEX 


Pranks) 0),.9°400,4402, 4045sw411% 
413, 414 

Freschet, M., 290, 307 

Freund, Jo¢£273) 274,82759 
27957280; . 286,80. 2905° 2935 
30752352, £346 

Friday, 9D. S35 eS176$3180f2275, 
286, 349 

Bryer ade Gt, ellos B12480125 

Rug kal... Ce SOW 


Gail, M., 343, 346 
Galambos, J., 274, 286, 309, 
SI OL See ola te 315.6316, 
SWS YE] 
Gardner, L. A., 182, 191 
Gdver5) D.2P. 58523) 325 
George, E. 0O., 160, 167 
Choshywtiew.K.%, 336, 837523385 
339534056394 15. 343,-3459 
346 
Ghosh, M., 283, 286 
Ghiselli;-E..E.,2270 
Ghurye, S. G., 346 
Giesbrecht, F., 134, 139 
Gili4bA. se 415 
Gilliland, <D. C., 340,2°347 
Gindler, Es M., ds 124 
Girdee Nn. L256 223 
Gnandesikan, R., 204, 209, 
214 
Gnedenko, B. C., 13, 22 
Cook, Ag? l).20 pe 40.04 
Goldberg, L. R., 366, 384 
Goodjel. J.,-3665<384 
Goodhart.* Ged. 5-57, - 593971 
Goodman, Wi, +289',):290,) 291, 
293',, 2951, 807 
Gorodetski ,. V-Ve, (363-48 
GottschalkGuawevH. ,« 112," 124 
eotze, F.,. 146, 155 . 
Grannis, Gorrs,. ds, 124 
Graser, 6D... Azo Sits: 307 
Graybill, F.iA., 38, 49 
Gridgeman, N. T., 112, 124 
Griffith,.W.,° 283, 286 
Gross PACs Ji%> 25,4 26,273 28, 
315% 335° 7950 80, 825889 
84,6274, 275 5:.286 
Guldberg, A., 59, 72 


431 


Gumbel; E..Js, 170, -180, +290; 
2985. 3075-341, 934295347 

Cuptase RR. C4 3275832958 5308 
333 

Gurland, W25859 38d 


Habbema, J. D. F., 384 

Hadwiger, H., 391, 394 

Hahn eGivads, 233522 

Hajek, J., 170, 180 

Hakimi, S., 401, 414 

Halfon,JEty 2455 256 

Hamming, R. W., 244, 256 

Hannon, J., 340, 347 

Harary, F..,, 6413 

Hardinerpe. ene tl2e 24: 

Haris, Cs Mi. Visa 22 

Harris, iR. 7 29135307 

HartersHoei. , 1555937 5e1s9 

Hartley,°H..0.; 805 84,.230 

Hasselbland, V., 112, 124 

Hastings, Ne A. ede eel 32 

Hawkes, A. G., 276, 277, 278, 
2 ee 2OY 

Hawkins, D. M., 207, 214 

Healain M. Os. OR, 4) 204.209), «214 

Hernandez, F., 266, 267, 270 

Higgins Sus ,Js,a 071.6173, L798 
2742 28548 2905. 29156307 

Hida 8B. Ms, 139 

Hoeffding, W., 141, 149,.153, 
155 

HolliajaM.2S.,. 79), 84 

Holland,.P., 411,~-414 

Holdliander), ) W.3,04327',) 333 

Holmes,* Pi... Ts:,: 33052333 

Hope, Eee oles SOO 

Hosmer, @D.e Wey) Irs, LIZ e205 
124 

Hsieh, «He Ks S463. 167). 220% 
229 

HiStieuG;- ala. ate OieeeeZon, 

Hurst, Heo B..,* 354 


Illinois Department of Trans- 
portation, 191 


Jagers, P., 33054333 
Janegs Es Rey ld? 124 
felineki.7Z., 765°83, 84 


432 


Jennrich, R. I., 224, 229 

Hoglekar, A. M., 77, 84 

JohngeS.. 112, 9124 

Johnsons. No.L.7 575 728828. 795 
84, 85,1. 43250134, 1895 
16955220252 214 52165, 82383 
23254240, 2423. 247562523 
25452256, 92575227 24213 
274, -279,2287,,,290,; 82923 
303, 307 

Johnson, R.fAs, :17L, (19258736 
179% 182 ZOU 25957 2665 
2674227050274 (285 

Jona-Lasinio, G., 358, 360 

Uembe Sa Wig5 Sly if 


Kac, a!) Go2, 000 

Kamins, M., 26, 33 

Karoénski, M., 405, 409, 414 

Kativyar, RoSee aloo .mO7 

KattilseS*s Kopp Oa saee2 

Kay; Ji. W.5 1366; 23735 382, 
383, 384 

Kellerer, A. M., 387, 393, 
394 

Kemp, pASEW.o'57, 99,8 622 63% 
6455.65, ,68, 695 72 

Kemps C.4 Din 8 OO. OF 2 

Kempthorne, 0., 134, 139 

Kendall, Ds-G.., 61%-—-G2k 


72 

Kendall Ms Gos Lf, 22e 
397,394 

Kettenringsid. Re; 204, 209% 
214 

Ka tr, Ch 2iGs, thoes ies ml ole 
22605230 


Khinchin, A. I., 368, 384 

Kibble, 275 

Kiefer, J.5; 97,9108,7141, 
£49 5: 152,0153, 2540 
169, +180 

Rierer; N. M., 119;.420,; 124 

Kimeldorf, G., 350, 360 

Kingman, J. F. C., 387, 391, 
394 

Kodama, Y., 413 

Kohberger, 279, 285 


— 


AUTHOR INDEX 


Kolmogorv, A. N., 350, 353, 
356, 360 

Komlés, Je; 1435-155 

Kottegoda, N. T., 354, 360 

Kote; Sogs57g862,089, 855.132, 
134, 13990195 9R20298 214; 
2163 (217572472 262 [82545 
2573 2729 270sb2146eg2z79, 
2865 ° 2875 £289,0290, 2943 
293, 29593035 307, (S135 
B14 SEZ 

Kounias, E. G., 316, 317 

Krishnaiah, PscR. , 224562295 
27552 279 58-287 

Krishnamoorthy, A. S., 279, 
287 

Krivyakova, E. N., 147, 155 

Kullback, S., 262, 270, 368, 
385 

Kw6rel 33Se Mast 31556 31¢ 


LaBreeque 7 J8% 5588, 22 

Laird, N. Mis 10S, 2108; FTEgs 
124 

Lam; .Goer. 78275.. 286 

Landenna, G., 193 

Langberg, N. A., 343, 347, 


3503. 359 
Laud »: Ps W.,2319,0320, 3323) 
325 


Laurent, AiG.) 333 

Lawrance, A. J., 354, 360 

Lee, As, eli2zg 125 

Lee, Les: (2808 £287 

Lée ele elitgt 272883 

Lehmann, E. L., 44, 48, 88, 
89 31945 335058360 

Leibler, R. A., 368, 385 

Leinhardt, S., 411, 414 

Leony) Ruf Vi, s50 359 

Lether, E. Mis 243; 257 

Levy, -Ba, 3565, 360 

Lewis) $1.2 035m 2 O4n 2 055 
209, 213 

Li Grie Gee yO 

Lillieftors, Haw.) 49, 23 

LinteGseiCrae Lae 2208 156s lion, 

Lindley, D. V., 369, 385 


AUTHOR INDEX 


Lindsay, B. G., 95, 96, 98, 
105, 109 

iipowaeM. , 26, ¢2n5 «33 

Liptak, T., £60 , 2167, 

ELeEtelL “RSEC 781605 ,167 

Liu, «Vit 2745 286 

Lhoyd ; .DESKo¢ {262627 4233 

Locke}. CS$112 -a23 

Lockhart, Ra, 3135 317 

Lott, J A, SBE. S24 


Macdonald, P. D. M., 112, 124 

MacLaren, M. D., 120, 124 

Mahfoud, M., 331, 333 

Major’ Pues l4sisi44 

Maldelbrot, B. B., 354, 355, 
35630359, 561 

Malgk, He tJ 66132 

Malkovich 4. Eohe., ~ibe 23 

Mann, vot Bagw222.-229 

Mann, ~ Na Re, 14°23, 272, 
287 

Marasini, D., 193 

MAviGuUS sa:Kssn/ Oy / 08 OU, 
82.835 845 85 

Mardias2AK.iVa5..242% 243, 
DATS e251 

Marsaglia, G., 120, 124 

Marshall, At.W.; 169,.,170; 
LIZEELBOR 2749 23562773 
278, 279%a 281,4 282522833 
284% 2866. 290; .291542933 
3075, 3085- 314, . 317913205 
325, 341, 342, 347 

Martynov, G. V., 147, 155 

MaphatecA, SMe, plo ,peL67 

Mauchly, 23% Wi, 225, 1229 

Mazumdar, S., 77, 85 

Mehrotra, K. G., 172, 180, 
27550287 

Meilijson, I., 327, 334 

Meir, A., 409, 414 

McCloskey, J. W., 61, 72 

MeGirr, Ey M., 3665 3629 369; 
372, 385 

Michalek\ fd, .Et4 172;. i180; 
TEAS 

Milés, sR: Ee 9238/7, 3909f 395 

Moeschberger, M. L., 280, 287, 
336, 346 


433 


Molluzzo, J., 413 

Moon, J., 409, 414 

Moore, A. A., 134, 139 

Moore, M. F., 384 

Moore, R. H., 204, 214 

Mofan “PS ASTER? }8587 75394, 
395 

Moranda, P. B., 76, 83, 84, 
85 

Morgenstern, D., 290, 308 

Morrison, D. G., 328, 331, 
333, 334 

Muced, eh. 64 Ose50 7 

Mudholkar, G. S., 13, 22, 
L57, .1458, .160% 167; 
1945 2027219 

Mumme, D. C,, 127 

Muth jE...) .7,,5327, 83285-4329 ; 
334 


Nadas, A., 340, 347 
Nagao, H., 157, 158, 167 


Nagarsanker, B. N., 225, 226, 
229 

Narudasy StiGs.,..34335,2346 

Nath; Ga\B. 7°79, .85 

Natvics SBi.,, 4515, <317 

Navin y@he Posh. 49235. 917 

Nayer cP Pins. 2220 56229 

Nelson, W. C., 60, 72 

Neyman, J., 72, 220, 230 

Nixon, 214 


Oikingal,.,2269, “170, 172, 180, 
2743 27550 278382799 028a5 
2874. 2905 2939¢ 3087 341, 
3425 °347 

Oosterhoff, J., 160, 168 

Orden) .oK.§ blige) 23. 2425c25F 


Pagé,oE1S.5 2182, 4191 

Parrish, Ry Ss s.241, e25256254, 
257 

Parthasarathy, M., 279, 287 

Patankar, V. N., 49 


434 


Patil, -G. P. £157, 59, e076; 
62, 68).:69, 078, 72; Lbs 
1235 11767 1180) 225, °2565 
290, 293, 307 

Paulson, A. S., 207, 214, 
277, 278, 279h 285, n2es 
288 

Peaceck$#J.°B., 132 

Pearson; &. S., 221, 2278214, 
220,.0230, -25457298 

Pearson, Kis 112,525 

Peli td, L235 7358,, 8300 

Pesarints be eo leo) 2a 0 

Peters Be eG. gle 2eo 2D 

Petersonms Aves sw Ge nO 

Pettitt.) A eNns al Os. bie 23s 
56 

Phileippowge Ney wO2 22 

Phiiiecpisne be Dien GOO waoa. 
Bos 

Pillai. K.ACk Stiga 2250 2265 
229 

Policeliio, 9G. Eee. ele 25 

Pollock asa. 2osesS 

Pradhan Menus ones i, 

Prestony Es ed, i 2p 25 

Proschan, Ha, e2/2yne Loan eis 
278,527 9),0260, 02025 
28355284, 52055. 200, 
288, USl0 sole sib. eS hie 
31955 320% 322, . 325sma2ae 
833, (843, 347, -s50Nes5oR 
360 

tanh IIe San Sols 9 6ti7/ 


OuandeyaR hss slide Lele 
2 2 bee oe 

Quenouille, M. H:;€625°73 

Ominzi, VAs, p43, 1 34h 


Ragisba, H., 366,385 

Raniseye J. 'B,,oll2,P114, dae 
28, 125 

RAOMeb ER. , 77, 85 

Raogmeemet. , 112,01 25en224 
226,02 30 

RaOwomors. S23, 325 

Rao, Meee 275, 279, 287 


> 


AUTHOR INDEX 


Ratnaparkhi, M. V., 331, 334 

Rayment, *P2"R., 292, 225 

Reid;"W TS, 2%, .92 

Révész, P., 143, 155 

Reynolds, D. S., 323, 325 

Ricci) A. ,’ 194% 13471399 

Ringers List. gaisyece 

Robertson, C. A., 112, 124, 
125 

Robinson, R., 405, 414 

Rose, D. M., 340, 347 

Rosenblatt, M., 141, 147, 
14930152, 054 15750635 
180$f852323961 

Ross, S: M., °284)°2885 350; 
361 

Roussas, G.~G., 62, 72 

Roy, "SG." N.,° FPR23, VASh, aes 


159, 168 

Rozanov, Ya. A., 44, 49, .3535 
360 

Rubin,’ Dt, B.,» 103,- LOSssi19, 
124 


Sagriista,~ St NOt 2905 308 

Samptord§ MieR.5177,2885 

Sanathanan, Ll. Piee745er3, Sy 
SISS84N085 

Sarkaddh, (Ke; 4, 823 56 

Sathes<Y¥s S65 S31682317 

Sauerders, R., 319 

SavageeteL. Ri) 3232 325 

Savitss {0 AHie 27as, 2808 2615 
2825 2835 284% 285¢ 350; 
360 

Schafer JiR. CBS, - 272 e287 

Scheaffér, CR! 1.99330, 9334 

Scheuer, Ee Min 145) 23 

Schever,.EsiM. 2/5 285a62 

Sen, A. Ki,» 182, 2883, 2790; 
191 

Sen, P. K.,843, 49 

Sethuraman, J., 283, 284, 286, 
8285, 325 

Schwenk, A., 405, 414 

Shahives.ebs, 316. 37 

Shaked, M., 283, 287, 350, 
360 

Shannon, C. E., 368, 385 


AUTHOR INDEX 


Shapiro, SS, 83,. 4, See, 
7S 72S =, 17 5.329, 0 25625, 
56 
Shaw, L., 279, 287 
Shen, S. M., 378, 384 
Shenton «Lic RK: 5.1124 22,4375 
13920 2315623250240 
Shimizu, R., 59), 60; 73 
Shirahata, S., 172, 180 
Sibuya, M., 59, 60, 73 
Sidak, Z., 170, 180 
SiidiquinwMseM.,5327, 330, 
333 
Siegel, S., 366, 384 
Silverman, E., 27, 32 
Simar tise, £97,099 52405, 209 
Simoni, de, S., 308 
SinghjeM.,. 2035. 2125. 214 
Singpurwal lajeN.. Dt, 272, 282, 
285.6 26650207 
Sinha Be Ke, 112, 123 
Sdotand= Mo, 6207, 8214 
SmithseWiob.«f 795.85 
Solovyev, A. .D., 13; 22 
Spurrdier,id« Ds5.280, 5288 
Srivastava. M, Sa, -L8i, 182, 
VESSEL 90419159226, 230 
Stachosel., e205, L460, 155 
Stelda,gA; The 5055 360 
Stephens, M. A., 4, 9, 10, 
PIS Zon es 20 
Steutel,’ F.enW.5° 63:73 
Stuart, nA .cdeige22 
SttyenteH 1S.48 2905308 
Sabbaiahs P.Gads7q a9450 202 
Subrahmanian, K., 66, 73 
Silloeeel ety ee OE Giese ZOO 
Suppes, P., 366, 384 


Tadikamalla, P. R., 231, 240 

Tanks .Wsewenn L25 

Taqouy, MoS8., 354, 359, -361 

Teyilorys Died. .Reel25¢ 125 

Taylors Rie 366,2 367, 369, 
B72e 384), 385 

Teicher, H., 96,109 

Thompson, W. A., 280, 287, 
291, 307 

Dietjen.eGeL.,811,) 22, 204, 
214 


435 


Piku, Most ,« 203420447 205), 
207,50 2115 Wi? 21 Gets 
216 

Trivedi {0M WC..,a Sy muliG7awe2 9), 

Tsiatis, A., 340, 347 

Tubbs, aJeuDs, 12s 125 

Tukeys Ji 0W.§ 2593626950270 

Tukey yoWiel.J4793 185 

Tusnady;¢G:, #143341445 155, 
156 

tagetls Sis. (65.5 ZS, PAehy/ 

Ahab Vai IMas IA) AUIS) 


Uppulurd [eye Ree Rese2.7 e279. 
287, 288 
Uven, van M. J., 290, 308 


Van Ness, J. W., 356, 361 
van Uvien, Mead 242...257 
Vasicek,: OF, 125-424 
Vianeldi eS. 50137, 139 
VoOSsSseRion. 2 oDon ESO 


Wald, A., 222, 229 
Nadkexr,- Herk., 112, 125 
Waikup, D.eW., 350. 360 
Waller ;e Jeol. 5377, 385 
Walddicheds Regeso5, 36 
Wasserman, S., 411, 414 


Watson.) Dk4 e505.885582, 85 

Watson, G. S., 36, 49 

Watson,.G.—Sp5.828, 334 

Weler.2 DeiRe, loo, 25 1735 
179, 180, 280, 288 

Weinman, 279 

WednrichteM wc 285 33 

Weisberg, S., 4, 24 

WedisssnGs* Alte 326,0354 

Wells; 9We 1.5. 320,))334 

West, S. A., 384 

White, dielenmese 25h 

Wicksell, 275 

Willcom cms steer gimeG: sacle me ode 
£9 235 

WithkSemOniOn, O95 24,5 LOB, 
NO She 20S .e2 045ue 2075 
124 

Wideltomshe Clb. 07, 72 

Wattes wise dlnaa 7, 86 

Wolo: Jnmvis sul 2 O59 23,925 


c- taelqead 


A 


* 
raw 
y r 
ee 
a DD, 
; 
' ‘ 
. Wid 
‘ 
eas 


ft oP Se = 


="? 


we = 


SUBJECT INDEX 


approximations, 231 

asymptotically distribution- 
free test, 51 

asumptotic expansions, 75 


bivariate distribution, 241 

bivariate exponential, 169 

bivariate logarithmic series 
distribution, 57 


censored samples, 1, 203 
characteristic function, 271 
characterization, 289 
characterizations, 327 
chi-square statistics, 25 
chi-square tests, 35 
chi-squred goodness-of-fit, 
Su 
coherent system, 309 
combination of tests, 157 
competing risks, 335 
complementary risks, 335 
component life, 309 
consistent tests, 193 
convexity, 387 
Cramér-von Mises type tests 
of independence, 141 


degree of uncertainty, 363 

dependence concepts, 349 

dependent model, 309 

detecting a change point, 181 

digraphs, 397 

discrete distributions, 259 

distributions, 397 

distributions of minimum and 
maximum, 335 

distribution of test statis- 
tiéss 181 

distribution tables and rates 


of convergence for both, 141 


distributional tests, l 


EDF tests, l 

empirical processes, 141 
estimating n, 75 
estimation, 127 


evaluation of distribution 
functions, 241 

exact slopes, 157 

exponential distribution, 289 

exponential families, 87 

exponential family, 95 

extreme value distribution, 

309 


failure rate, 319, 327 
families of statistical dis- 
CEAbDUtIOnIS, DL 
feature selection discrepancy, 
363 
fractional Brownian motion, 
349 


gaussian mixtures, 111 
geometrical probability, 387 
gestation model, 271 
goodness-of-fit tests, 35, 51 
graphs, 397 


hazard rate, 289, 309 
hypothesis testing, 193 


identifiability, 335 
independence between two 
sets of variates, 219 
inequalities, 309 
inference discrepancy, 363 
inferential tasks, 363 
information gain index, 363 
invariance principles, 141 
isoprobability contour, 289 


Johnson's system, 231 
Kendall's tan, 169 


least squares, 231 

life testing, 169 

likelihood ratio test, 25 

likelihood ratio test and 
estimate, 181 

linear processes, 35 

locally most powerful rank 
test, 169 


438 


logarithmic transform, 289 

logarithmic transformation, 
133 

logistic normal distribu- 
tions, 363 


maximum likelihood, 87, 
aaLib 
maximum likelihood estimators, 
25, 
maximum likelihood esti- 
mators, 95 
mean residual life function, 
S27 
mixing, 349 
mixing distributions, 95 
mixture, 327, 
mixtures, 95 
mixtures of distriputions, 111 
modes of genesis, 57 
modified maximum likelihood 
estimators, 203 
moments, 231 
multivariate Cramér-von Mises 
SLatlStLe Lal 
multivariate distribution, 
309 
multivariate distributions, 
289 
multivariate exponential dis- 
tribution, 271 
multivariate exponential 
extensions, 271 
multivariate gamma distri- 
bution, 271 
multivariate IFR distribu- 
tions, 271 
multivariate IFRA, 271 
multivariate logarithmic 
series distribution, 57 
multivariate mean, 181 
multivariate NBU distri- 
butions, 271 
multivariate NBUE distribu- 
tions, 271 
multivariate outliers, 203 
multivariate Weibull distri- 
bution, 271 


SUBJECT INDEX 


noniterative estimation, 87 
normal mixtures, 111 
normative models, 363 
nuisance parameters, 51 
numerical approximation, 127 


objective grouping, 57 


parameter estimation, 133 
Pareto distribution, 127 
Pearson system, 241 
performance analysis, 363 
Pitman relative efficiency, 
169 
Poisson process, 319 
power-law spectra, 349 
power study, 169 
power transform, 289 
probabilistic data, 363 


quadrature, 241 


random secants, 387 
rational fraction, 231 
reliability, 335 
renewal process, 327 
robust estimators, 203 


sample size estimation, 75 

second order efficiency, 75 

self-similarity, 349 

series and parallel systems, 

335 

shock models, 271 

shock model, 319 

Spearman's rho, 169 

sphericity, 219 

statistics, 397 

stepdown procedure, 157 

stochastic expansion, 25 

stochastic failure rate, 319 

stochastic processes, 349 

strictly stationary and 
strong mixing processes, 35 

subject inference, 363 

survival analysis, 327 

system life, 309 


SUBJECT INDEX 


t-distribution, 193 

test for lognormality, 133 
test of independence, 169 
threshold model, 271 
transformation, 231 
transformations, 259 
truncated samples, 75 
t-tests, 193 


439 
unbiased tests, 193 


weak convergence, 319 
Weibull, 279, 280 
Weibull distribution, 309, 319 
weighed Cramér-von Mises 
test, 51 
W tests, 1 


; +2 - 
SES. i RE Spins 
a 


7 , ed : ed ee ee 


2 vi4aetlers. 3} = Los 


i, 


za itantens Qe 


; cf ise, 


ni; : ; 44% . 
‘ ~ 
= Wve, Sk teeta ee 
<u 
“< 7“ 
- 1 ‘al 
é . o 


>» 
ww “Af “> 
ty Ks | oe ” 
> \ t 
Ke a %, 25 


19 ‘ot ae ee 


ie 


fi 
ee 


t) 


: SS 
oS ee 
Se 


Ot 
yl 


See Soares 
Reese oaaroronr 
SoS SSS ae eS 
Seen eee re re ap erg ee Sere 


’ 

e 
te 
iv 


4 
{ 
7 


ait 
, 


é 


4 


ae 
ut 
fe 
- 


¢ 
f) 


Set 
cet 
SSS 


6 

i 
; 

+i¢ 
yee 


SESS ee 
ee EO OE es 
= = Sa CS 


if 


5 
’ 


ty 
ae 
On 
? 
i 
¢ 


+ 

‘: 
On 
By 
ft, 


4 
H 
7 
u 
i 
? 


¥ 4 
4 
ts 
ie 
te 


¢ 
us 
G 
t 
4 
u 
UG 


4 
i 
if 
ut 
Hi 
i 
4 
2 

i 
ot) 

+ 

4 

: 


( 
La! 


e] 
0 
4 
? 
u 
é 
; 
2 


a! 


é 
4 
a 
é 


4 
é 
? 
é 
} 
z 
? 


ieee tos 
Litt Sens 


i 
4 
‘ 
; 
4 
4 
4 


4 
wt 
i 
tf 


meer > = 

SSS ES 

Sa 
es 


3 

i 
He 
i 
i 
i 


¢ 
D 
U 

Y 
G 
oe 
} 

é 


t 
f 
i 
é 
G 
, 
if 


ore se 
See 
Sopdet ate ee RN 
eS 


t 
é 
u 
¢ 
U 
tl 
Uy 
3 
? 
fl 

Ms 


i 
Ci 
‘ 
t 
es 
, 
“ 
if 


* 
- 


, 


4 
ie 


ty 


Se Sofa ores 


